Re: [R] Simple permutation question

2014-06-26 Thread Robert Latest
On Wed, 25 Jun 2014 14:16:08 -0700 (PDT)
Jeff Newmiller jdnew...@dcn.davis.ca.us wrote:

 The brokenness of your perm.broken function arises from the attempted
 use of sapply to bind matrices together, which is not something
 sapply does.
 
 perm.fixed - function( x ) {
if ( length( x ) == 1 ) return( matrix( x, nrow=1 ) )
lst - lapply( seq_along( x )
 , function( i ) {
 cbind( x[ i ], perm.jdn( x[ -i ] ) )
   }
 )
do.call(rbind, lst)
 }

Nice, exactly what I was looking for (including typo). Thanks!

robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simple permutation question

2014-06-25 Thread Robert Latest
So my company has hired a few young McKinsey guys from overseas for a
couple of weeks to help us with a production line optimization. They
probably charge what I make in a year, but that's OK because I just
never have the time to really dive into one particular time, and I have
to hand it to the consultants that they came up with one or two really
clever ideas to model the production line. Of course it's up to me to
feed them the real data which they then churn through their Excel
models that they cook up during the nights in their hotel rooms, and
which I then implement back into my experimental system using live data.

Anyway, whenever they need something or come up with something I skip
out of the room, hack it into R, export the CSV and come back in about
half the time it takes Excel to even read in the data, let alone
process it. Of course that gor them curious, and I showed off a couple
of scripts that condense their abysmal Excel convolutions in a few
lean and mean lines of R code.

Anyway, I'm in my office with this really attractive, clever young
McKinsey girl (I'm in my mid-forties, married with kids and all, but I
still enjoyed impressing a woman with computer stuff, of all things!),
and one of her models involves a simple permutation of five letters --
A through E.

And that's when I find out that R doesn't have a permutation function.
How is that possible? R has EVERYTHING, but not that? I'm
flabbergasted. Stumped. And now it's up to me to spend the evening at
home coding that model, and the only thing I really need is that
permutation.

So this is my first attempt:

perm.broken - function(x) {
if (length(x) == 1) return(x)
sapply(1:length(x), function(i) {
cbind(x[i], perm(x[-i]))
})
}

But it doesn't work:
 perm.broken(c(A, B, C))
 [,1] [,2] [,3]
[1,] A  B  C 
[2,] A  B  C 
[3,] B  A  A 
[4,] C  C  B 
[5,] C  C  B 
[6,] B  A  A 
 

And I can't figure out for the life of me why. It should work because I
go through the elements of x in order, use that in the leftmost column,
and slap the permutation of the remaining elements to the right. What
strikes me as particularly odd is that there doesn't even seem to be a
systematic sequence of letters in any of the columns. OK, since I
really need that function I wrote this piece of crap:

perm.stupid - function(x) {
b - as.matrix(expand.grid(rep(list(x), length(x
b[!sapply(1:nrow(b), function(r) any(duplicated(b[r,]))),]
}

It works, but words cannot describe its ugliness. And it gets really
slow really fast with growing x.

So, anyway. My two questions are:
1. Does R really, really, seriously lack a permutation function?
2. OK, stop kidding me. So what's it called?
3. Why doesn't my recursive function work, and what would a
   working version look like?

Thanks,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding lines to complex xyplot

2014-02-25 Thread Robert Latest
Hello Lib,

I think what you're trying to do is very easy using ggplot2 -- easy, that
is, once you got your hear around ggplot2 in the first place. The layering
you mention is the core feature of ggplot2. Fortunately it is
well-documented including a thin, overpriced book from Springer (which I
have, and like, but am not sure I should recommend).

Good luck,
robert


On Tue, Feb 25, 2014 at 8:34 PM, Lib Gray libgray3...@gmail.com wrote:

 Hello,

 I am branching out to xyplot for the first time, and I want to layer
 several complex xyplots. I have tried using panel functions, but so far I
 lose all complexity from the scatterplot. I would like to have the
 following things in the plot:


 1) A plot of observation vs. modeled individual prediction, by treatment
 arm, with each subjects' points connected by lines.


 xyplot(Observation,IPrediction,groups=TreatmentArm,type=b,col=c(1,2,3),cex=0.7)


 2) Over the former, I would like to add loess smoothers.

 I am able to do this in the former with type=c(b,smooth), but I would
 like to differentiate the smoothers from the rest of the plot with thicker
 line widths, and possibly colors.


 3) Also over the former, I would like to add a simple abline(0,1).

 I can add this, but not also the loess and treatment arm differences with
 panel=function(x,y){}, but cannot figure out to keep all the former
 complexity.



 Basically, I am trying to recreated the four basic diagnostic plots from
 xpose4, but adding color for treatment differences.

 Any help would be appreciated!

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to get function arguments as list?

2014-02-09 Thread Robert Latest
Hello all,

To set options in a package I'm putting together I'd like to write a
function like options, that is:

my.options - function(...) {
# ...
}

Now I'd like to access the named arguments that were passed to my
funtion within that function. How does that work? formals() doesn't do
it, neither does args() or alist().

How is that done?

Thanks,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to get function arguments as list?

2014-02-09 Thread Robert Latest
On Sun, 09 Feb 2014 12:28:11 +
Rui Barradas ruipbarra...@sapo.pt wrote:

 Hello,
 
 Inside the function try
 
 dots - list(...)

Hi guys,

thanks a lot. I knew it HAD to be something ultra-simple, like most
things in R.

Regards,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is there a neat R trick for this?

2013-02-12 Thread Robert Latest
Hello all,

given two vectors X and Y I'd like to receive a vector Z which
contains, for each element of X, the index of the corresponding
element in Y (or NA if that element isn't in Y).

Example:

x - c(4,5,6)
y - c(10,1,5,12,4,13,14)

z - findIndexIn(x, y)
z
[1] 5 3 NA

1st element of z is 5, because the 1st element of x is at the 5th position in y
2nd element of z is 3, because the 2nd element of x is at the 3rd position in y
3rd element of z is NA, because the 3rd element of x is not in y

Of course I can write the function findIndexIn() using a for loop, but
in 80% of cases when I felt the urge to use for in R it turned out
that there was already some builtin operator or function that did the
trick.

Suggestions, anyone?
Thanks,

robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a neat R trick for this?

2013-02-12 Thread Robert Latest
Hi guys,

like so often, the answert came to me minutes after posting. pmatch()
does exactly what I need. match() gives the values of the elements,
but not their positions.

Thanks,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Finding the last value before a certain date

2012-07-19 Thread Robert Latest
Hello all,

I have a dataframe that looks like this:

head(df)
datey
1 2010-09-27 1356
2 2010-10-04 1968
3 2010-10-11 2602
4 2010-10-17 3116
5 2010-10-24 3496
6 2010-10-31 3958

I need a function that, given any date, returns the y value
corresponding to the given date or the last day before the given date.

Example:

Input: as.Date(2010-10-06). Output: 1968 (because the last value is
from 2010-10-04)

I've been tinkering with this for an hour now, without success. I
think the solution is either surprisingly complicated or surprisingly
simple.

Thanks,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression Analysis or Anova?

2012-05-16 Thread Robert Latest
Hello Andrea,

I don't know if I can help you (probably not, I'm a beginner myself),
but you that you should make it a lot easier for those that can if you
post a self-contained script in this forum that shows what you're
trying to do. Use dput() to dump your dataset in text form.

Good luck,
robert


On Tue, May 15, 2012 at 10:49 PM, Andrea Sica aerdna.s...@gmail.com wrote:
 Dear all,

 I hope to be the clearest I can.
 Let's say I have a dataset with 10 variables, where 4 of them represent for
 me a certain phenomenon that I call Y.
 The other 6 represent for me another phenomenon that I call X.

 Each one of those variables (10) contains 37 units. Those units are just
 the respondents of my analysis (a survey).
 Since all the questions are based on a Likert scale, they are qualitative
 variables. The scale is from 0 to 7 for all of
 them, but there are -1 and -2 values where the answer is missing. Hence
 the scale goes actually from -2 to 7.

 What I want to do is to calculate the regression between my Y (which
 contains 4 variables in this case and 37 answers
 for each variable) and my X (which contains 6 variables instead and the
 same number of respondents). I know that for
 qualitative analyses I should use Anova instead of the regression, although
 I have read somewhere that it is even possible
 to make the regression.

 Until now I have tried to act this way:
 __
 apply(Y, 1, function(Y) mean(Y[Y0])) #calculate the average per rows
 (respondents) without considering the negative values

 Y.reg- c(apply(Y, 1, function(Y) mean(Y[Y0]))) #create the vector Y,
 thus it results like 1 variable with 37 numbers

 apply(X, 1, function(X) mean(X[X0]))

 X.reg- c(apply(X, 1, function(X) mean(X[X0]))) #create the vector
 X, thus it results like 1 variable with 37 numbers

 reg1- lm(Y.reg~ X.reg) #make the first regression
 summary(reg1) #see the results

 Call:
 lm(formula = Y.reg ~ X.reg)

 Residuals:
     Min         1Q       Median      3Q       Max
 -2.26183 -0.49434 -0.02658  0.37260  2.08899

 Coefficients:
                 Estimate Std. Error   t value   Pr(|t|)
 (Intercept)   4.2577     0.4986      8.539    4.46e-10 ***
 X.reg          0.1008     0.1282      0.786    0.437
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.7827 on 35 degrees of freedom
 Multiple R-squared: 0.01736,    Adjusted R-squared: -0.01072
 F-statistic: 0.6182 on 1 and 35 DF,  p-value: 0.437

 layout(matrix(1:4,2,2)) #graphical approach
 plot(reg1)

 please see the pfd() function attached.
 

 But as you can see, although I do not use Y as composed by 4 variables and
 X by 6, and I do not consider the negative values
 too, I get a very low score as my R^2.

 If I act with anova instead I have this problem:
 
 Ymatrix- as.matrix(Y)
 Xmatrix- as.matrix(X) #where both this Y and X are in their first form,
 thus composed by more variables (4 and 6) and with
 #negative values as well.

 Errore in UseMethod(anova) :
  no applicable method for 'anova' applied to an object of class
 c('matrix', 'integer', 'numeric')
 

 To be honest, a few days ago I succeeded in using anova, but unfortunately
 I do not remember how and I did not save the
 command anywhere.

 What I would like to know is:

 - First of all, am I wrong in how I approach to my problem?
 - What do you think about the regression output?
 - Finally, how can I do to make the anova? If I have to do it.

 I really hope I have been clear. Thank you all for any kind of help.

 Best,

 Andrea

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to interpret an ANOVA result?

2012-05-15 Thread Robert Latest
On Tue, May 15, 2012 at 1:59 PM, Bryan Hanson han...@depauw.edu wrote:
 I see that no one has replied on this, so I'll take a stab.

Hi, Ryan!

 This is probably a matter of personal taste, but I would suggest a somewhat 
 different and simpler approach.  What you have done is not strictly an ANOVA, 
 it's a linear model (they are related).  But the particular way you've asked 
 R to report gives you the answer in terms of the linear model.

I did that because it seemed to give me the estimates (means) and
standard errors for each factor level in a nice table.

 That means your significance stars refer to whether or not the slopes in the 
 model differ significantly from zero.  Perhaps you are aware of this.

I'm not. In a dataset with no continuous explanatory variables, where
do the slopes come from? I though in this case R only outputs
intercepts.


 Anyway, I thought your data set was interesting, so I took the approach that 
 comes to my mind.  Here it is.  It might be pretty much self-explanatory, if 
 not, try ?aov and ?TukeyHSD for details.

TukeyHSD looks interesting. I'll look into it.

Thanks,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to include known errors in a regression?

2012-05-15 Thread Robert Latest
Hello all,

I have a bunch of aggregated measurement data. The data describe two
different physical properties that correlate, and I want to estimate
the coefficients (slope and intercept) from the dataset.

This is of course easy, I've done it, and I got the expected result.

But here's the thing: Each data point in X and Y is actually a mean of
N individual (automated) measurements taken from the same object. I
have the mean, the standard deviation (SD) and N for each datapoint.
One datapoint corresponds to one of several (different) objects.

Is there any way I can enter this knowledge into the model? I need to
estimate the errors quite precisely, and I feel that I'm throwing away
valuable data by not using N and SD.I'm thinking about bloating my
datapoints into fake datasets by creating a rnorm sample with the
given mean, N, and SD, but that sounds silly. Maybe I'll do it as an
experiment to see if it has any significant impact.

To clarify: For each datapoint (X, Y) I additionally have (sdX, sdY)
and (nX, nY). So each (X, Y) would be turned into a nX*nY combination
of all values of rnorm(nX, X, sdX) and rnorm(nY, Y, sdY). Then I'd
pitch all of this together an a linear model. Makes sense?

My goal is to replace one (slow, expensive) measurement by another
(fast, cheap) one, and I need to establish the correlation (and
especially the expected error margin)  between the two to see if it is
feasible.

Thanks,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to interpret an ANOVA result?

2012-05-14 Thread Robert Latest
Hello all,

here's a real-world example: I'm measuring a quantity (d) at five
sites (site1 thru site5) on a silicon wafer. There is a clear
site-dependence of the measured value. To find out if this is a
measurement artifact I measured the wafer four times: twice in the
normal position (posN), and twice rotated by 180 degrees (posR). My
data looks like this (full, self-contained code at bottom). Note that
sites with the same number correspond to the same physical location on
the wafer (the rotation has already been taken into account here).

 head(x)
 d site pos
1 13831   N
2 13771   R
3 13881   R
4 13731   N
5 13862   N
6 13942   R

 boxplot (d~pos+site)

This boxplot (see code) already hints at a true site-dependence of the
measured value (no artifact). OK, so let's do an ANOVA to make this
more quantitative:

 summary(lm(d ~ site*pos)

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept) 1378.000  3.078 447.672   2e-16 ***
site2 11.500  4.353   2.642  0.02466 *
site3 12.000  4.353   2.757  0.02025 *
site4 17.000  4.353   3.905  0.00294 **
site5  1.000  4.353   0.230  0.82294
posR   4.500  4.353   1.034  0.32561
site2:posR-4.000  6.156  -0.650  0.53050
site3:posR   -10.500  6.156  -1.706  0.11890
site4:posR-5.500  6.156  -0.893  0.39264
site5:posR-3.000  6.156  -0.487  0.63655

Now I think that I see the following:
- The average of d at site1 in pos. N (first in alphabet) is 1378.
- Average values for site2, 3, 4 (especially 4) in pos. N deviate
significantly from pos. 1. For instance, values at site4 are on
average 17 greater than at site1.
- The average value at site5 does not differ significantly from site1.
OK, that was the top part of the result table. Now the bottom part:
- In reverse position(posR) the average of d at site1 is 4.5 bigger,
but that's not significant.
- The average of d at site3:posR is 10.5 smaller than something, but
smaller than what? And why does this -10.5 deviation have a p-value of
.1 (not significant) vs the .02 (significant) deviation of 11.5
(site2, top part)?

Let's see if I can figure that out. Difference between posN and posR
at site3 is not so big:
 mean(d[site==3pos==R])-mean(d[site==3pos==N])
[1] -6
Is this what makes it insignificant?

Shuffling around the numbers until I get to -10.5:

 mean(d[site==3pos==R])-mean(d[site==3pos==N])-(mean(d[site==1pos==R])-mean(d[site==1pos==N]))
[1] -10.5

OK, one has to keep track of all the differences and stuff.

So I think I have understood about 80% of this simple example. The
reason I'm going after this so stubbornly is that I'm at the beginning
of a DOE which will take several weeks of measuring and will end up
being analyzed with a big ANOVA (two response and about six
explanatory variables, some continuous, some factorial). Already in
the DOE phase I want to understand what I will be doing with the data
later (this is for a Six Sigma project in an industrial production
environment, in case anybody wants to know).

Thanks,
robert

Here's the full dataset:

x - structure(list(d = c(1383L, 1377L, 1388L, 1373L, 1386L, 1394L,
1386L, 1393L, 1390L, 1382L, 1386L, 1390L, 1395L, 1396L, 1392L,
1395L, 1378L, 1382L, 1379L, 1380L), site = structure(c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L), .Label = c(1, 2, 3, 4, 5), class = factor),
pos = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c(N,
R), class = factor)), .Names = c(d, site, pos), row.names = c(NA,
-20L), class = data.frame)
attach(x)
head(x)
boxplot (d~pos+site)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ANOVA question

2012-05-11 Thread Robert Latest
Hello all,

I'm very satisfied to say that my grip on both R and statistics is
showing the first hints of firmness, on a very greenhorn level.

I'm faced with a problem that I intend to analyze using ANOVA, and to
test my understanding of a primitive, one-way ANOVA I've written the
self-contained practice script below. It works as expected.

But here's my question: How can I not only get the values of the
coefficients for the different levels of the explanatory factor(s),
but also the corresponding standard errors and confidence levels?
Below I have started doing that on foot by looping over the levels
of my single factor, but I suppose this gets complicated and messy
with more complex models. Any ideas?

Thanks,
robert


set.seed(0)

N - 100 # sample size

MEAN - c(10, 20, 30, 40, 50)
VAR - c(20,20,1, 20, 20)
LABELS - c(A, B, C, D, E)

# create a data frame with labels
df - data.frame(Label=rep(LABELS, each=N))
df$Value - NA
# fill in random data for each factor level
for (i in 1:length(MEAN)) {
df$Value[(1+N*(i-1)):(N*i)] - rnorm(N, MEAN[i], sqrt(VAR[i]))
}



par(mfrow=c(2,2))
plot(df)  # Box plot of the data
plot(df$Value)# scatter plot

mod_aov - aov(Value ~ Label, data=df)

print(summary(mod_aov))
print(mod_aov$coefficients)

rsd - mod_aov$residuals

plot(rsd)

# find and print mean() and var() for each level
for (l in levels(df$Label)) {
index - df$Label == l

# Method 1: directly from data
smp - df$Value[index]  # extract sample for this label
ssq_smp -  var(smp)*(length(smp)-1) # sum of squares is variance
 # times d.f.
# Method 2: from ANOVA residuals
rsd_grp - rsd[index]# extract residuals
ssq_rsd - sum(rsd_grp **2)  # compute sum of squares

# print mean, variance, and difference between SSQs from the two
# methods.
write(sprintf(%s: mean=%5.1f var=%5.1f (%.2g), l,
mean(smp), var(smp),
ssq_smp-ssq_rsd), )
# ...and it works like expected! But is there a shortcut that would give me
# the same result in a one-liner?
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA question

2012-05-11 Thread Robert Latest
Hello Thierry,

thanks for your answer! There is one thing, however, that I don't
understand. The values labeled B in my data are generated with
1/20th the variance of the others, yet the standard error and
confidence intervals are the same for all levels of the factor. How
come?


 summary(mod_lm0)$coef
   Estimate Std. Error   t value  Pr(|t|)
LabelA 10.10138  0.3937038  25.65730  5.752714e-93
LabelB 19.79629  0.3937038  50.28218 1.226942e-196
LabelC 30.06722  0.3937038  76.37016 4.825571e-276
LabelD 40.01442  0.3937038 101.63586  0.00e+00
LabelE 49.78282  0.3937038 126.44738  0.00e+00


Thanks,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to deal with a dataframe within a dataframe?

2012-05-09 Thread Robert Latest
On Tue, May 8, 2012 at 3:38 PM, R. Michael Weylandt
michael.weyla...@gmail.com wrote:
 So this actually looks like something of a tricky one: if you wouldn't
 mind sending the result of dput(head(agg)) I can confirm, but here's
 my hunch:

Hi Michael,

while I'm trying to get my head around the rest of your post, here's
the output of dput():

 dput(head(agg))
structure(list(`df$quarter` = c(09Q3, 10Q1, 10Q2, 10Q3,
11Q1, 11Q2), `df$tool` = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = c(VS1A, VS1B, VS2A, VS2B, VS3A, VS3B,
VS4A, VS4B, VS5B), class = factor), `df$value` = structure(list(
`0` = c(1.80053430839867, 1.62848325226279), `1` = c(1.29965212329278,
1.26130173276939), `2` = c(1.69901753654472, 1.38156952313768
), `3` = c(1.31168126092175, 1.06723157138633), `4` = c(1.54165763354293,
1.21619657757276), `5` = c(1.29925171313276, 1.18276707678292
)), .Names = c(0, 1, 2, 3, 4, 5))), .Names = c(df$quarter,
df$tool, df$value), row.names = c(NA, 6L), class = data.frame)


I would like this in either the form of a flat data frame (i.e., the
contents of df$value as two separate columns), or -- even preferable
-- learn a better way to retrieve multiple numeric results from a call
to aggregate().

Thanks,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to apply functions across columns?

2012-05-09 Thread Robert Latest
Hello,

me again.

I have a data frame that looks like this (actual dput output at bottom):

 head(tencor)
datelot wf.id   s1   s2   s3   s4   s5
1 08.05.2012 W0X3H0 9 1238 1263 1244 1200 1183
2 08.05.2012 W0X3H010 1367 1396 1371 1325 1311
3 08.05.2012 W0X3H011 1383 1417 1393 1346 1328

I'd like to add a column to this that gives, for each row, the
averages of the values in the columns s1 to s5. Really primitive. But
I totally absolute don't understand how to do this. I don't need any
intelligence, I know my values are always in columns 4:8.

Thanks,
robert

 dput(tencor)
structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = 08.05.2012, class = factor), lot =
structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = W0X3H0, class = factor),
wf.id = c(9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
4L), s1 = c(1238L, 1367L, 1383L, 1395L, 1479L, 1411L, 1404L,
1398L, 1402L, 1380L, 1376L), s2 = c(1263L, 1396L, 1417L,
1420L, 1527L, 1452L, 1438L, 1432L, 1432L, 1412L, 1403L),
s3 = c(1244L, 1371L, 1393L, 1395L, 1497L, 1424L, 1410L, 1404L,
1398L, 1382L, 1385L), s4 = c(1200L, 1325L, 1346L, 1346L,
1444L, 1372L, 1361L, 1362L, 1359L, 1338L, 1334L), s5 = c(1183L,
1311L, 1328L, 1336L, 1426L, 1357L, 1347L, 1344L, 1339L, 1325L,
1322L)), .Names = c(date, lot, wf.id, s1, s2, s3,
s4, s5), class = data.frame, row.names = c(NA, -11L))


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to apply functions across columns?

2012-05-09 Thread Robert Latest
On Wed, May 9, 2012 at 4:19 PM, R. Michael Weylandt
michael.weyla...@gmail.com wrote:
 Good reproducible example ;-)

 Easiest is probably just:

 cbind(tencor, ThisRowMean =  rowMeans(tencor[, 4:8]))

Actually, after frying my brain on tapply() and sapply() I found that
just plain apply() does what I need:

tencor$mean - apply(tencor[4:8], 1, FUN=mean)

This way I'm also not tied to just mean() as aggregator but can use
any homemade function (this would have been my followup question had I
followed your advice ;-)

Thanks!
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to deal with a dataframe within a dataframe?

2012-05-08 Thread Robert Latest
Hello all,

I am doing an aggregation where the aggregating function returns not a
single numeric value but a vector of two elements using return(c(val1,
val2)). I don't know how to access the individual columns of that
vector in the resulting dataframe though. How is this done correctly?
Thanks, robert


 agg - aggregate(formula=df$value ~ df$quarter + df$tool,
+ FUN=cp.cpk, lsl=1300, usl=1500)
 head(agg)
  df$quarter df$tool   df$value
1   09Q3VS1A 1.800534, 1.628483
2   10Q1VS1A 1.299652, 1.261302
3   10Q2VS1A 1.699018, 1.381570
4   10Q3VS1A 1.311681, 1.067232
 head(agg[df$value])
df$value
1 1.800534, 1.628483
2 1.299652, 1.261302
3 1.699018, 1.381570
4 1.311681, 1.067232
 class(agg[df$value])
[1] data.frame
 head(agg[df$value][1]) # trying to select 1st column
df$value
1 1.800534, 1.628483
2 1.299652, 1.261302
3 1.699018, 1.381570
4 1.311681, 1.067232
 head(agg[df$value][2]) # trying to select 2nd column
Error in `[.data.frame`(agg[df$value], 2) : undefined columns selected



# FWIW, here's the aggregating function
function(data, lsl, usl) {
if (length(data)  15) {
return(NA)
} else {
return (c(
(usl-lsl)/(6*sd(data)),
min(mean(data)-lsl, usl-mean(data))/(3*sd(data)))
)
}
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Replacing tick labels in a plot

2012-05-04 Thread Robert Latest
Hello,

is it possible to replace the text of tick marks in a plot?
Specifically, I'd like to have a ppnorm plot in which the theoretical
quantiles are not expressed in terms of standard deviations, but in
actual percentages. Anybody who's seen a probability plot in MINITAB
knows what I'm talking about.

I have somewhat listlessly looked at mtext(), thinking that I could
maybe first create my plot, then draw a white rectangle over the
original numbers (how?), and then inserting my own numbers using
mtext(). To me this sounds so stupid that I haven't yet invested any
effort into actually giving it a shot.

Any ideas?

BTW, is it normal that each and every post to this list, even by list
members, has to go through moderator approval?

Thanks,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing tick labels in a plot

2012-05-04 Thread Robert Latest
Hello Sarah,

thanks for your quick answer. This is exactly what I was looking for,
embarrassingly simple if I might add. Sometimes R is like a huge
workshop with unlabeled tool magazines: You know the tool exists, but
not where it is. And if you find it, the instructions are often quite
terse.

The moderator approval thing has gone away, too.

Best regards,
bob

On Fri, May 4, 2012 at 7:13 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 What about ?axis as a place to start?

 Are you sure that the email address that your message appears to be
 coming from is identical to the one you used when you signed up?
 That's a frequent cause of moderation.

 Sarah

 On Fri, May 4, 2012 at 1:09 PM, Robert Latest boblat...@gmail.com wrote:
 Hello,

 is it possible to replace the text of tick marks in a plot?
 Specifically, I'd like to have a ppnorm plot in which the theoretical
 quantiles are not expressed in terms of standard deviations, but in
 actual percentages. Anybody who's seen a probability plot in MINITAB
 knows what I'm talking about.

 I have somewhat listlessly looked at mtext(), thinking that I could
 maybe first create my plot, then draw a white rectangle over the
 original numbers (how?), and then inserting my own numbers using
 mtext(). To me this sounds so stupid that I haven't yet invested any
 effort into actually giving it a shot.

 Any ideas?

 BTW, is it normal that each and every post to this list, even by list
 members, has to go through moderator approval?

 Thanks,
 robert


 --
 Sarah Goslee
 http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create a data.frame from several time series?

2012-04-17 Thread Robert Latest
Hello all,

followup to yesterday's question: Part of my confusion was caused by
my embarrassing mistake of overwriting the ppk function with another
object, which of course broke the next iteration of the loop.

Secondly, I got exactly what I wanted like this:

aggregate.zoo - function(series) {
agg - aggregate(data=series, value ~ month, ppk, lsl=1300, usl=1500)
return (zoo(x=agg$value, order.by=agg$month))
}
l1 = split(df, df$tool)
l2 = lapply(l1, aggregate.zoo)
l3 = do.call(merge, l2)

I puzzled this together from various example with only 80%
understanding how it works and why.

Regards,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to create a data.frame from several time series?

2012-04-16 Thread Robert Latest
Hello all,

please look at my code below. The problems start where it says #
PROBLEMS START HERE. Some sample data is at the very bottom.

This is the disgnostic output from the script:

 source('load.R')
  ts.null
1  NA
2  NA
3  NA
4  NA
5  NA
6  NA
[1] Adding data VS1A
  ts.null VS1A.ts.null VS1A.tts
1  NA   NA   NA
2  NA   NA   NA
3  NA   NA 1.585324
4  NA   NA 1.326600
5  NA   NA 1.914382
6  NA   NA 1.333249
[1] Adding data VS1B
Error in get(as.character(FUN), mode = function, envir = envir) :
  object 'FUN' of mode 'function' was not found


 I have several issues with that.
1) Why doesn't the data frame df.all have timestamps in its first column?
2) Why aren't the additional columns named VS1A, VS1B,  but
VS1A.ts.null, VS1A.tts?
3) What does the error message at the end mean, and why doen't it
occur on the first loop iteration?

It seems like I could also first create all the time series and then
use ts.union to combine them into a data frame, but I don't know how
to do that because I don't know beforehand how many series I create in
the for() loop, how to distinguish them by (unknown beforehand) tool
names, and how to supply them to ts.union.

Thanks,
robert


 CODE HERE

library(zoo)

ppk - function(data, lsl, usl) {
if (length(data)  15) {
return(NA)
} else {
return (min(mean(data)-lsl,
usl-mean(data))/(3*sd(data)))
}
}

load - function(filename) {
d - read.table(filename,
header=TRUE,
sep='\t')
# filter data
d - d[d$value = 1300  d$value = 1500,]
# add column for later aggregation
d$month = as.yearmon(d$timestamp)
return(d)
}


df - load('data.tsv')

# create an all-encompassing time series to unionize the actual data with
ts.null = ts(data=NA, start=min(df$month), end=max(df$month),
frequency=12)
print(ts.null)

#
# PROBLEMS START HERE
#

df.all - data.frame(ts.null)
# I was hoping to have a data frame with monthly time stamps in the first
# column. Not so.

for (ti in levels(df$tool)) {
print(head(df.all))
print(c(Adding data, ti))
ppk - aggregate(
data=df[df$tool==ti,],
value~month, ppk, lsl=1300, usl=1500)
tts - as.ts(zooreg(ppk$value, order.by=ppk$month, frequency=12))
# I'm hoping that zooreg() fills in empty months with NAs, but I have no
# idea how to deal with leading or trailing empty months

df.all[ti] - ts.union(ts.null, tts)
# This totally doesn't work as expected, and it messes up something so bad
# that the script crashes on the second iteration.

}



 some DF data here

 timestamp tool valuemonth
1  2010-01-26 08:41:04 VS1A  1400 Jan 2010
2  2010-01-26 08:44:04 VS4A  1420 Jan 2010
3  2010-01-26 10:15:45 VS4B  1400 Jan 2010
4  2010-01-26 11:37:53 VS1B  1360 Jan 2010
5  2010-01-26 12:53:53 VS1B  1380 Jan 2010
6  2010-01-26 14:48:06 VS2B  1410 Jan 2010
7  2010-01-26 14:48:29 VS2A  1410 Jan 2010
8  2010-01-26 23:21:48 VS3A  1400 Jan 2010
9  2010-01-27 07:48:15 VS1A  1420 Jan 2010
10 2010-01-27 07:48:26 VS1B  1400 Jan 2010
11 2010-01-27 07:49:51 VS2A  1410 Jan 2010
12 2010-01-27 07:50:08 VS2B  1390 Jan 2010
13 2010-01-27 12:30:02 VS3A  1400 Jan 2010
14 2010-01-27 12:30:19 VS3B  1420 Jan 2010
15 2010-01-27 12:30:36 VS4B  1420 Jan 2010
16 2010-02-08 11:47:54 VS1A  1370 Feb 2010
17 2010-02-08 11:48:06 VS1B  1370 Feb 2010
18 2010-02-08 11:49:42 VS3A  1430 Feb 2010
19 2010-02-08 11:50:09 VS3B  1350 Feb 2010
20 2010-02-08 11:51:06 VS2A  1400 Feb 2010


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying a function to categorized data?

2012-04-13 Thread Robert Latest
Hello Steve,

thank you for your reply. You're right, just before I read your post
I'd found aggregate() and indeed it brought me a long way towards my
goal.

I've been a C programmer for 20+ years, and I'm fairly firm in SQL, so
to understand R I need to lose my scalar and row (record) oriented
thinking and get my head into vectors and columns.

I'm still nowhere near where I think I need to be in order to work mit
my data. I'll get back to the list when I have pinpointed my problem a
bit better, and I'll also supply some sample data.

Have a nice weekend,
robert

On Thu, Apr 12, 2012 at 8:52 PM, steven mosher mosherste...@gmail.com wrote:
  Welcome to R and the list.

  Others may suggest books ( Nutshell was my first ) but first there are some
 things that will help you
  both in programming and getting help on the list.

  You should post executable code in your question. So, build a toy example
 of the data.frame you have
 and show what you tried. Folks here should be able to run your toy example
 and  show you how to get the answer you want.

 For your problem I'm guessing that aggregate() would be one path

 ?aggregate

  you will need to specify   by  to aggregate by month

 Steve

 On Thu, Apr 12, 2012 at 7:10 AM, Robert Latest boblat...@gmail.com wrote:

 Hi all,

 I'm just getting started in R. My problem is the following:

 I have a data frame (v1) with lots of production data measurements.
 Each row contains a single measurement ('ARI_MIT') with a timestamp. I
 want to lump the data by months with their mean and standard
 deviation.

 I have already successfully managed to do the lumping by adding
 another column to my data frame:

 v1$MONTH = strftime(v1$TIMESTAMP, %y%m)

 This makes a nice month-wise boxplot of my data, although I don't have
 an idea why:
 boxplot(v1$ARI_MIT ~ v1$MONTH)

 I don't need this plotted, though, but in the form of a new data frame
 with three columns: the month, the mean, and the standard deviation of
 all values from that month.

 I tried un-stacking v1 into a list of vectors and then looping over
 its elements, calculating the mean of each group:

 for (i in unstack(v1, v1$ARI_MIT ~ v1$MONTH)) { write(mean(i), ) }

 This works, but how do I get the data into a data frame? With the
 month labels in a column? They are not avaliable inside the loop body.

 I know I need to get a book on R.

 Thanks,
 robert

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Applying a function to categorized data?

2012-04-12 Thread Robert Latest
Hi all,

I'm just getting started in R. My problem is the following:

I have a data frame (v1) with lots of production data measurements.
Each row contains a single measurement ('ARI_MIT') with a timestamp. I
want to lump the data by months with their mean and standard
deviation.

I have already successfully managed to do the lumping by adding
another column to my data frame:

v1$MONTH = strftime(v1$TIMESTAMP, %y%m)

This makes a nice month-wise boxplot of my data, although I don't have
an idea why:
boxplot(v1$ARI_MIT ~ v1$MONTH)

I don't need this plotted, though, but in the form of a new data frame
with three columns: the month, the mean, and the standard deviation of
all values from that month.

I tried un-stacking v1 into a list of vectors and then looping over
its elements, calculating the mean of each group:

for (i in unstack(v1, v1$ARI_MIT ~ v1$MONTH)) { write(mean(i), ) }

This works, but how do I get the data into a data frame? With the
month labels in a column? They are not avaliable inside the loop body.

I know I need to get a book on R.

Thanks,
robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.