Re: [R] Simple permutation question
On Wed, 25 Jun 2014 14:16:08 -0700 (PDT) Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: The brokenness of your perm.broken function arises from the attempted use of sapply to bind matrices together, which is not something sapply does. perm.fixed - function( x ) { if ( length( x ) == 1 ) return( matrix( x, nrow=1 ) ) lst - lapply( seq_along( x ) , function( i ) { cbind( x[ i ], perm.jdn( x[ -i ] ) ) } ) do.call(rbind, lst) } Nice, exactly what I was looking for (including typo). Thanks! robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple permutation question
So my company has hired a few young McKinsey guys from overseas for a couple of weeks to help us with a production line optimization. They probably charge what I make in a year, but that's OK because I just never have the time to really dive into one particular time, and I have to hand it to the consultants that they came up with one or two really clever ideas to model the production line. Of course it's up to me to feed them the real data which they then churn through their Excel models that they cook up during the nights in their hotel rooms, and which I then implement back into my experimental system using live data. Anyway, whenever they need something or come up with something I skip out of the room, hack it into R, export the CSV and come back in about half the time it takes Excel to even read in the data, let alone process it. Of course that gor them curious, and I showed off a couple of scripts that condense their abysmal Excel convolutions in a few lean and mean lines of R code. Anyway, I'm in my office with this really attractive, clever young McKinsey girl (I'm in my mid-forties, married with kids and all, but I still enjoyed impressing a woman with computer stuff, of all things!), and one of her models involves a simple permutation of five letters -- A through E. And that's when I find out that R doesn't have a permutation function. How is that possible? R has EVERYTHING, but not that? I'm flabbergasted. Stumped. And now it's up to me to spend the evening at home coding that model, and the only thing I really need is that permutation. So this is my first attempt: perm.broken - function(x) { if (length(x) == 1) return(x) sapply(1:length(x), function(i) { cbind(x[i], perm(x[-i])) }) } But it doesn't work: perm.broken(c(A, B, C)) [,1] [,2] [,3] [1,] A B C [2,] A B C [3,] B A A [4,] C C B [5,] C C B [6,] B A A And I can't figure out for the life of me why. It should work because I go through the elements of x in order, use that in the leftmost column, and slap the permutation of the remaining elements to the right. What strikes me as particularly odd is that there doesn't even seem to be a systematic sequence of letters in any of the columns. OK, since I really need that function I wrote this piece of crap: perm.stupid - function(x) { b - as.matrix(expand.grid(rep(list(x), length(x b[!sapply(1:nrow(b), function(r) any(duplicated(b[r,]))),] } It works, but words cannot describe its ugliness. And it gets really slow really fast with growing x. So, anyway. My two questions are: 1. Does R really, really, seriously lack a permutation function? 2. OK, stop kidding me. So what's it called? 3. Why doesn't my recursive function work, and what would a working version look like? Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding lines to complex xyplot
Hello Lib, I think what you're trying to do is very easy using ggplot2 -- easy, that is, once you got your hear around ggplot2 in the first place. The layering you mention is the core feature of ggplot2. Fortunately it is well-documented including a thin, overpriced book from Springer (which I have, and like, but am not sure I should recommend). Good luck, robert On Tue, Feb 25, 2014 at 8:34 PM, Lib Gray libgray3...@gmail.com wrote: Hello, I am branching out to xyplot for the first time, and I want to layer several complex xyplots. I have tried using panel functions, but so far I lose all complexity from the scatterplot. I would like to have the following things in the plot: 1) A plot of observation vs. modeled individual prediction, by treatment arm, with each subjects' points connected by lines. xyplot(Observation,IPrediction,groups=TreatmentArm,type=b,col=c(1,2,3),cex=0.7) 2) Over the former, I would like to add loess smoothers. I am able to do this in the former with type=c(b,smooth), but I would like to differentiate the smoothers from the rest of the plot with thicker line widths, and possibly colors. 3) Also over the former, I would like to add a simple abline(0,1). I can add this, but not also the loess and treatment arm differences with panel=function(x,y){}, but cannot figure out to keep all the former complexity. Basically, I am trying to recreated the four basic diagnostic plots from xpose4, but adding color for treatment differences. Any help would be appreciated! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to get function arguments as list?
Hello all, To set options in a package I'm putting together I'd like to write a function like options, that is: my.options - function(...) { # ... } Now I'd like to access the named arguments that were passed to my funtion within that function. How does that work? formals() doesn't do it, neither does args() or alist(). How is that done? Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get function arguments as list?
On Sun, 09 Feb 2014 12:28:11 + Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Inside the function try dots - list(...) Hi guys, thanks a lot. I knew it HAD to be something ultra-simple, like most things in R. Regards, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there a neat R trick for this?
Hello all, given two vectors X and Y I'd like to receive a vector Z which contains, for each element of X, the index of the corresponding element in Y (or NA if that element isn't in Y). Example: x - c(4,5,6) y - c(10,1,5,12,4,13,14) z - findIndexIn(x, y) z [1] 5 3 NA 1st element of z is 5, because the 1st element of x is at the 5th position in y 2nd element of z is 3, because the 2nd element of x is at the 3rd position in y 3rd element of z is NA, because the 3rd element of x is not in y Of course I can write the function findIndexIn() using a for loop, but in 80% of cases when I felt the urge to use for in R it turned out that there was already some builtin operator or function that did the trick. Suggestions, anyone? Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a neat R trick for this?
Hi guys, like so often, the answert came to me minutes after posting. pmatch() does exactly what I need. match() gives the values of the elements, but not their positions. Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding the last value before a certain date
Hello all, I have a dataframe that looks like this: head(df) datey 1 2010-09-27 1356 2 2010-10-04 1968 3 2010-10-11 2602 4 2010-10-17 3116 5 2010-10-24 3496 6 2010-10-31 3958 I need a function that, given any date, returns the y value corresponding to the given date or the last day before the given date. Example: Input: as.Date(2010-10-06). Output: 1968 (because the last value is from 2010-10-04) I've been tinkering with this for an hour now, without success. I think the solution is either surprisingly complicated or surprisingly simple. Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression Analysis or Anova?
Hello Andrea, I don't know if I can help you (probably not, I'm a beginner myself), but you that you should make it a lot easier for those that can if you post a self-contained script in this forum that shows what you're trying to do. Use dput() to dump your dataset in text form. Good luck, robert On Tue, May 15, 2012 at 10:49 PM, Andrea Sica aerdna.s...@gmail.com wrote: Dear all, I hope to be the clearest I can. Let's say I have a dataset with 10 variables, where 4 of them represent for me a certain phenomenon that I call Y. The other 6 represent for me another phenomenon that I call X. Each one of those variables (10) contains 37 units. Those units are just the respondents of my analysis (a survey). Since all the questions are based on a Likert scale, they are qualitative variables. The scale is from 0 to 7 for all of them, but there are -1 and -2 values where the answer is missing. Hence the scale goes actually from -2 to 7. What I want to do is to calculate the regression between my Y (which contains 4 variables in this case and 37 answers for each variable) and my X (which contains 6 variables instead and the same number of respondents). I know that for qualitative analyses I should use Anova instead of the regression, although I have read somewhere that it is even possible to make the regression. Until now I have tried to act this way: __ apply(Y, 1, function(Y) mean(Y[Y0])) #calculate the average per rows (respondents) without considering the negative values Y.reg- c(apply(Y, 1, function(Y) mean(Y[Y0]))) #create the vector Y, thus it results like 1 variable with 37 numbers apply(X, 1, function(X) mean(X[X0])) X.reg- c(apply(X, 1, function(X) mean(X[X0]))) #create the vector X, thus it results like 1 variable with 37 numbers reg1- lm(Y.reg~ X.reg) #make the first regression summary(reg1) #see the results Call: lm(formula = Y.reg ~ X.reg) Residuals: Min 1Q Median 3Q Max -2.26183 -0.49434 -0.02658 0.37260 2.08899 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 4.2577 0.4986 8.539 4.46e-10 *** X.reg 0.1008 0.1282 0.786 0.437 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.7827 on 35 degrees of freedom Multiple R-squared: 0.01736, Adjusted R-squared: -0.01072 F-statistic: 0.6182 on 1 and 35 DF, p-value: 0.437 layout(matrix(1:4,2,2)) #graphical approach plot(reg1) please see the pfd() function attached. But as you can see, although I do not use Y as composed by 4 variables and X by 6, and I do not consider the negative values too, I get a very low score as my R^2. If I act with anova instead I have this problem: Ymatrix- as.matrix(Y) Xmatrix- as.matrix(X) #where both this Y and X are in their first form, thus composed by more variables (4 and 6) and with #negative values as well. Errore in UseMethod(anova) : no applicable method for 'anova' applied to an object of class c('matrix', 'integer', 'numeric') To be honest, a few days ago I succeeded in using anova, but unfortunately I do not remember how and I did not save the command anywhere. What I would like to know is: - First of all, am I wrong in how I approach to my problem? - What do you think about the regression output? - Finally, how can I do to make the anova? If I have to do it. I really hope I have been clear. Thank you all for any kind of help. Best, Andrea [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to interpret an ANOVA result?
On Tue, May 15, 2012 at 1:59 PM, Bryan Hanson han...@depauw.edu wrote: I see that no one has replied on this, so I'll take a stab. Hi, Ryan! This is probably a matter of personal taste, but I would suggest a somewhat different and simpler approach. What you have done is not strictly an ANOVA, it's a linear model (they are related). But the particular way you've asked R to report gives you the answer in terms of the linear model. I did that because it seemed to give me the estimates (means) and standard errors for each factor level in a nice table. That means your significance stars refer to whether or not the slopes in the model differ significantly from zero. Perhaps you are aware of this. I'm not. In a dataset with no continuous explanatory variables, where do the slopes come from? I though in this case R only outputs intercepts. Anyway, I thought your data set was interesting, so I took the approach that comes to my mind. Here it is. It might be pretty much self-explanatory, if not, try ?aov and ?TukeyHSD for details. TukeyHSD looks interesting. I'll look into it. Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to include known errors in a regression?
Hello all, I have a bunch of aggregated measurement data. The data describe two different physical properties that correlate, and I want to estimate the coefficients (slope and intercept) from the dataset. This is of course easy, I've done it, and I got the expected result. But here's the thing: Each data point in X and Y is actually a mean of N individual (automated) measurements taken from the same object. I have the mean, the standard deviation (SD) and N for each datapoint. One datapoint corresponds to one of several (different) objects. Is there any way I can enter this knowledge into the model? I need to estimate the errors quite precisely, and I feel that I'm throwing away valuable data by not using N and SD.I'm thinking about bloating my datapoints into fake datasets by creating a rnorm sample with the given mean, N, and SD, but that sounds silly. Maybe I'll do it as an experiment to see if it has any significant impact. To clarify: For each datapoint (X, Y) I additionally have (sdX, sdY) and (nX, nY). So each (X, Y) would be turned into a nX*nY combination of all values of rnorm(nX, X, sdX) and rnorm(nY, Y, sdY). Then I'd pitch all of this together an a linear model. Makes sense? My goal is to replace one (slow, expensive) measurement by another (fast, cheap) one, and I need to establish the correlation (and especially the expected error margin) between the two to see if it is feasible. Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to interpret an ANOVA result?
Hello all, here's a real-world example: I'm measuring a quantity (d) at five sites (site1 thru site5) on a silicon wafer. There is a clear site-dependence of the measured value. To find out if this is a measurement artifact I measured the wafer four times: twice in the normal position (posN), and twice rotated by 180 degrees (posR). My data looks like this (full, self-contained code at bottom). Note that sites with the same number correspond to the same physical location on the wafer (the rotation has already been taken into account here). head(x) d site pos 1 13831 N 2 13771 R 3 13881 R 4 13731 N 5 13862 N 6 13942 R boxplot (d~pos+site) This boxplot (see code) already hints at a true site-dependence of the measured value (no artifact). OK, so let's do an ANOVA to make this more quantitative: summary(lm(d ~ site*pos) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1378.000 3.078 447.672 2e-16 *** site2 11.500 4.353 2.642 0.02466 * site3 12.000 4.353 2.757 0.02025 * site4 17.000 4.353 3.905 0.00294 ** site5 1.000 4.353 0.230 0.82294 posR 4.500 4.353 1.034 0.32561 site2:posR-4.000 6.156 -0.650 0.53050 site3:posR -10.500 6.156 -1.706 0.11890 site4:posR-5.500 6.156 -0.893 0.39264 site5:posR-3.000 6.156 -0.487 0.63655 Now I think that I see the following: - The average of d at site1 in pos. N (first in alphabet) is 1378. - Average values for site2, 3, 4 (especially 4) in pos. N deviate significantly from pos. 1. For instance, values at site4 are on average 17 greater than at site1. - The average value at site5 does not differ significantly from site1. OK, that was the top part of the result table. Now the bottom part: - In reverse position(posR) the average of d at site1 is 4.5 bigger, but that's not significant. - The average of d at site3:posR is 10.5 smaller than something, but smaller than what? And why does this -10.5 deviation have a p-value of .1 (not significant) vs the .02 (significant) deviation of 11.5 (site2, top part)? Let's see if I can figure that out. Difference between posN and posR at site3 is not so big: mean(d[site==3pos==R])-mean(d[site==3pos==N]) [1] -6 Is this what makes it insignificant? Shuffling around the numbers until I get to -10.5: mean(d[site==3pos==R])-mean(d[site==3pos==N])-(mean(d[site==1pos==R])-mean(d[site==1pos==N])) [1] -10.5 OK, one has to keep track of all the differences and stuff. So I think I have understood about 80% of this simple example. The reason I'm going after this so stubbornly is that I'm at the beginning of a DOE which will take several weeks of measuring and will end up being analyzed with a big ANOVA (two response and about six explanatory variables, some continuous, some factorial). Already in the DOE phase I want to understand what I will be doing with the data later (this is for a Six Sigma project in an industrial production environment, in case anybody wants to know). Thanks, robert Here's the full dataset: x - structure(list(d = c(1383L, 1377L, 1388L, 1373L, 1386L, 1394L, 1386L, 1393L, 1390L, 1382L, 1386L, 1390L, 1395L, 1396L, 1392L, 1395L, 1378L, 1382L, 1379L, 1380L), site = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c(1, 2, 3, 4, 5), class = factor), pos = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c(N, R), class = factor)), .Names = c(d, site, pos), row.names = c(NA, -20L), class = data.frame) attach(x) head(x) boxplot (d~pos+site) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ANOVA question
Hello all, I'm very satisfied to say that my grip on both R and statistics is showing the first hints of firmness, on a very greenhorn level. I'm faced with a problem that I intend to analyze using ANOVA, and to test my understanding of a primitive, one-way ANOVA I've written the self-contained practice script below. It works as expected. But here's my question: How can I not only get the values of the coefficients for the different levels of the explanatory factor(s), but also the corresponding standard errors and confidence levels? Below I have started doing that on foot by looping over the levels of my single factor, but I suppose this gets complicated and messy with more complex models. Any ideas? Thanks, robert set.seed(0) N - 100 # sample size MEAN - c(10, 20, 30, 40, 50) VAR - c(20,20,1, 20, 20) LABELS - c(A, B, C, D, E) # create a data frame with labels df - data.frame(Label=rep(LABELS, each=N)) df$Value - NA # fill in random data for each factor level for (i in 1:length(MEAN)) { df$Value[(1+N*(i-1)):(N*i)] - rnorm(N, MEAN[i], sqrt(VAR[i])) } par(mfrow=c(2,2)) plot(df) # Box plot of the data plot(df$Value)# scatter plot mod_aov - aov(Value ~ Label, data=df) print(summary(mod_aov)) print(mod_aov$coefficients) rsd - mod_aov$residuals plot(rsd) # find and print mean() and var() for each level for (l in levels(df$Label)) { index - df$Label == l # Method 1: directly from data smp - df$Value[index] # extract sample for this label ssq_smp - var(smp)*(length(smp)-1) # sum of squares is variance # times d.f. # Method 2: from ANOVA residuals rsd_grp - rsd[index]# extract residuals ssq_rsd - sum(rsd_grp **2) # compute sum of squares # print mean, variance, and difference between SSQs from the two # methods. write(sprintf(%s: mean=%5.1f var=%5.1f (%.2g), l, mean(smp), var(smp), ssq_smp-ssq_rsd), ) # ...and it works like expected! But is there a shortcut that would give me # the same result in a one-liner? } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA question
Hello Thierry, thanks for your answer! There is one thing, however, that I don't understand. The values labeled B in my data are generated with 1/20th the variance of the others, yet the standard error and confidence intervals are the same for all levels of the factor. How come? summary(mod_lm0)$coef Estimate Std. Error t value Pr(|t|) LabelA 10.10138 0.3937038 25.65730 5.752714e-93 LabelB 19.79629 0.3937038 50.28218 1.226942e-196 LabelC 30.06722 0.3937038 76.37016 4.825571e-276 LabelD 40.01442 0.3937038 101.63586 0.00e+00 LabelE 49.78282 0.3937038 126.44738 0.00e+00 Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to deal with a dataframe within a dataframe?
On Tue, May 8, 2012 at 3:38 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: So this actually looks like something of a tricky one: if you wouldn't mind sending the result of dput(head(agg)) I can confirm, but here's my hunch: Hi Michael, while I'm trying to get my head around the rest of your post, here's the output of dput(): dput(head(agg)) structure(list(`df$quarter` = c(09Q3, 10Q1, 10Q2, 10Q3, 11Q1, 11Q2), `df$tool` = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c(VS1A, VS1B, VS2A, VS2B, VS3A, VS3B, VS4A, VS4B, VS5B), class = factor), `df$value` = structure(list( `0` = c(1.80053430839867, 1.62848325226279), `1` = c(1.29965212329278, 1.26130173276939), `2` = c(1.69901753654472, 1.38156952313768 ), `3` = c(1.31168126092175, 1.06723157138633), `4` = c(1.54165763354293, 1.21619657757276), `5` = c(1.29925171313276, 1.18276707678292 )), .Names = c(0, 1, 2, 3, 4, 5))), .Names = c(df$quarter, df$tool, df$value), row.names = c(NA, 6L), class = data.frame) I would like this in either the form of a flat data frame (i.e., the contents of df$value as two separate columns), or -- even preferable -- learn a better way to retrieve multiple numeric results from a call to aggregate(). Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to apply functions across columns?
Hello, me again. I have a data frame that looks like this (actual dput output at bottom): head(tencor) datelot wf.id s1 s2 s3 s4 s5 1 08.05.2012 W0X3H0 9 1238 1263 1244 1200 1183 2 08.05.2012 W0X3H010 1367 1396 1371 1325 1311 3 08.05.2012 W0X3H011 1383 1417 1393 1346 1328 I'd like to add a column to this that gives, for each row, the averages of the values in the columns s1 to s5. Really primitive. But I totally absolute don't understand how to do this. I don't need any intelligence, I know my values are always in columns 4:8. Thanks, robert dput(tencor) structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = 08.05.2012, class = factor), lot = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = W0X3H0, class = factor), wf.id = c(9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 4L), s1 = c(1238L, 1367L, 1383L, 1395L, 1479L, 1411L, 1404L, 1398L, 1402L, 1380L, 1376L), s2 = c(1263L, 1396L, 1417L, 1420L, 1527L, 1452L, 1438L, 1432L, 1432L, 1412L, 1403L), s3 = c(1244L, 1371L, 1393L, 1395L, 1497L, 1424L, 1410L, 1404L, 1398L, 1382L, 1385L), s4 = c(1200L, 1325L, 1346L, 1346L, 1444L, 1372L, 1361L, 1362L, 1359L, 1338L, 1334L), s5 = c(1183L, 1311L, 1328L, 1336L, 1426L, 1357L, 1347L, 1344L, 1339L, 1325L, 1322L)), .Names = c(date, lot, wf.id, s1, s2, s3, s4, s5), class = data.frame, row.names = c(NA, -11L)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to apply functions across columns?
On Wed, May 9, 2012 at 4:19 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Good reproducible example ;-) Easiest is probably just: cbind(tencor, ThisRowMean = rowMeans(tencor[, 4:8])) Actually, after frying my brain on tapply() and sapply() I found that just plain apply() does what I need: tencor$mean - apply(tencor[4:8], 1, FUN=mean) This way I'm also not tied to just mean() as aggregator but can use any homemade function (this would have been my followup question had I followed your advice ;-) Thanks! robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to deal with a dataframe within a dataframe?
Hello all, I am doing an aggregation where the aggregating function returns not a single numeric value but a vector of two elements using return(c(val1, val2)). I don't know how to access the individual columns of that vector in the resulting dataframe though. How is this done correctly? Thanks, robert agg - aggregate(formula=df$value ~ df$quarter + df$tool, + FUN=cp.cpk, lsl=1300, usl=1500) head(agg) df$quarter df$tool df$value 1 09Q3VS1A 1.800534, 1.628483 2 10Q1VS1A 1.299652, 1.261302 3 10Q2VS1A 1.699018, 1.381570 4 10Q3VS1A 1.311681, 1.067232 head(agg[df$value]) df$value 1 1.800534, 1.628483 2 1.299652, 1.261302 3 1.699018, 1.381570 4 1.311681, 1.067232 class(agg[df$value]) [1] data.frame head(agg[df$value][1]) # trying to select 1st column df$value 1 1.800534, 1.628483 2 1.299652, 1.261302 3 1.699018, 1.381570 4 1.311681, 1.067232 head(agg[df$value][2]) # trying to select 2nd column Error in `[.data.frame`(agg[df$value], 2) : undefined columns selected # FWIW, here's the aggregating function function(data, lsl, usl) { if (length(data) 15) { return(NA) } else { return (c( (usl-lsl)/(6*sd(data)), min(mean(data)-lsl, usl-mean(data))/(3*sd(data))) ) } } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Replacing tick labels in a plot
Hello, is it possible to replace the text of tick marks in a plot? Specifically, I'd like to have a ppnorm plot in which the theoretical quantiles are not expressed in terms of standard deviations, but in actual percentages. Anybody who's seen a probability plot in MINITAB knows what I'm talking about. I have somewhat listlessly looked at mtext(), thinking that I could maybe first create my plot, then draw a white rectangle over the original numbers (how?), and then inserting my own numbers using mtext(). To me this sounds so stupid that I haven't yet invested any effort into actually giving it a shot. Any ideas? BTW, is it normal that each and every post to this list, even by list members, has to go through moderator approval? Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replacing tick labels in a plot
Hello Sarah, thanks for your quick answer. This is exactly what I was looking for, embarrassingly simple if I might add. Sometimes R is like a huge workshop with unlabeled tool magazines: You know the tool exists, but not where it is. And if you find it, the instructions are often quite terse. The moderator approval thing has gone away, too. Best regards, bob On Fri, May 4, 2012 at 7:13 PM, Sarah Goslee sarah.gos...@gmail.com wrote: What about ?axis as a place to start? Are you sure that the email address that your message appears to be coming from is identical to the one you used when you signed up? That's a frequent cause of moderation. Sarah On Fri, May 4, 2012 at 1:09 PM, Robert Latest boblat...@gmail.com wrote: Hello, is it possible to replace the text of tick marks in a plot? Specifically, I'd like to have a ppnorm plot in which the theoretical quantiles are not expressed in terms of standard deviations, but in actual percentages. Anybody who's seen a probability plot in MINITAB knows what I'm talking about. I have somewhat listlessly looked at mtext(), thinking that I could maybe first create my plot, then draw a white rectangle over the original numbers (how?), and then inserting my own numbers using mtext(). To me this sounds so stupid that I haven't yet invested any effort into actually giving it a shot. Any ideas? BTW, is it normal that each and every post to this list, even by list members, has to go through moderator approval? Thanks, robert -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create a data.frame from several time series?
Hello all, followup to yesterday's question: Part of my confusion was caused by my embarrassing mistake of overwriting the ppk function with another object, which of course broke the next iteration of the loop. Secondly, I got exactly what I wanted like this: aggregate.zoo - function(series) { agg - aggregate(data=series, value ~ month, ppk, lsl=1300, usl=1500) return (zoo(x=agg$value, order.by=agg$month)) } l1 = split(df, df$tool) l2 = lapply(l1, aggregate.zoo) l3 = do.call(merge, l2) I puzzled this together from various example with only 80% understanding how it works and why. Regards, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to create a data.frame from several time series?
Hello all, please look at my code below. The problems start where it says # PROBLEMS START HERE. Some sample data is at the very bottom. This is the disgnostic output from the script: source('load.R') ts.null 1 NA 2 NA 3 NA 4 NA 5 NA 6 NA [1] Adding data VS1A ts.null VS1A.ts.null VS1A.tts 1 NA NA NA 2 NA NA NA 3 NA NA 1.585324 4 NA NA 1.326600 5 NA NA 1.914382 6 NA NA 1.333249 [1] Adding data VS1B Error in get(as.character(FUN), mode = function, envir = envir) : object 'FUN' of mode 'function' was not found I have several issues with that. 1) Why doesn't the data frame df.all have timestamps in its first column? 2) Why aren't the additional columns named VS1A, VS1B, but VS1A.ts.null, VS1A.tts? 3) What does the error message at the end mean, and why doen't it occur on the first loop iteration? It seems like I could also first create all the time series and then use ts.union to combine them into a data frame, but I don't know how to do that because I don't know beforehand how many series I create in the for() loop, how to distinguish them by (unknown beforehand) tool names, and how to supply them to ts.union. Thanks, robert CODE HERE library(zoo) ppk - function(data, lsl, usl) { if (length(data) 15) { return(NA) } else { return (min(mean(data)-lsl, usl-mean(data))/(3*sd(data))) } } load - function(filename) { d - read.table(filename, header=TRUE, sep='\t') # filter data d - d[d$value = 1300 d$value = 1500,] # add column for later aggregation d$month = as.yearmon(d$timestamp) return(d) } df - load('data.tsv') # create an all-encompassing time series to unionize the actual data with ts.null = ts(data=NA, start=min(df$month), end=max(df$month), frequency=12) print(ts.null) # # PROBLEMS START HERE # df.all - data.frame(ts.null) # I was hoping to have a data frame with monthly time stamps in the first # column. Not so. for (ti in levels(df$tool)) { print(head(df.all)) print(c(Adding data, ti)) ppk - aggregate( data=df[df$tool==ti,], value~month, ppk, lsl=1300, usl=1500) tts - as.ts(zooreg(ppk$value, order.by=ppk$month, frequency=12)) # I'm hoping that zooreg() fills in empty months with NAs, but I have no # idea how to deal with leading or trailing empty months df.all[ti] - ts.union(ts.null, tts) # This totally doesn't work as expected, and it messes up something so bad # that the script crashes on the second iteration. } some DF data here timestamp tool valuemonth 1 2010-01-26 08:41:04 VS1A 1400 Jan 2010 2 2010-01-26 08:44:04 VS4A 1420 Jan 2010 3 2010-01-26 10:15:45 VS4B 1400 Jan 2010 4 2010-01-26 11:37:53 VS1B 1360 Jan 2010 5 2010-01-26 12:53:53 VS1B 1380 Jan 2010 6 2010-01-26 14:48:06 VS2B 1410 Jan 2010 7 2010-01-26 14:48:29 VS2A 1410 Jan 2010 8 2010-01-26 23:21:48 VS3A 1400 Jan 2010 9 2010-01-27 07:48:15 VS1A 1420 Jan 2010 10 2010-01-27 07:48:26 VS1B 1400 Jan 2010 11 2010-01-27 07:49:51 VS2A 1410 Jan 2010 12 2010-01-27 07:50:08 VS2B 1390 Jan 2010 13 2010-01-27 12:30:02 VS3A 1400 Jan 2010 14 2010-01-27 12:30:19 VS3B 1420 Jan 2010 15 2010-01-27 12:30:36 VS4B 1420 Jan 2010 16 2010-02-08 11:47:54 VS1A 1370 Feb 2010 17 2010-02-08 11:48:06 VS1B 1370 Feb 2010 18 2010-02-08 11:49:42 VS3A 1430 Feb 2010 19 2010-02-08 11:50:09 VS3B 1350 Feb 2010 20 2010-02-08 11:51:06 VS2A 1400 Feb 2010 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying a function to categorized data?
Hello Steve, thank you for your reply. You're right, just before I read your post I'd found aggregate() and indeed it brought me a long way towards my goal. I've been a C programmer for 20+ years, and I'm fairly firm in SQL, so to understand R I need to lose my scalar and row (record) oriented thinking and get my head into vectors and columns. I'm still nowhere near where I think I need to be in order to work mit my data. I'll get back to the list when I have pinpointed my problem a bit better, and I'll also supply some sample data. Have a nice weekend, robert On Thu, Apr 12, 2012 at 8:52 PM, steven mosher mosherste...@gmail.com wrote: Welcome to R and the list. Others may suggest books ( Nutshell was my first ) but first there are some things that will help you both in programming and getting help on the list. You should post executable code in your question. So, build a toy example of the data.frame you have and show what you tried. Folks here should be able to run your toy example and show you how to get the answer you want. For your problem I'm guessing that aggregate() would be one path ?aggregate you will need to specify by to aggregate by month Steve On Thu, Apr 12, 2012 at 7:10 AM, Robert Latest boblat...@gmail.com wrote: Hi all, I'm just getting started in R. My problem is the following: I have a data frame (v1) with lots of production data measurements. Each row contains a single measurement ('ARI_MIT') with a timestamp. I want to lump the data by months with their mean and standard deviation. I have already successfully managed to do the lumping by adding another column to my data frame: v1$MONTH = strftime(v1$TIMESTAMP, %y%m) This makes a nice month-wise boxplot of my data, although I don't have an idea why: boxplot(v1$ARI_MIT ~ v1$MONTH) I don't need this plotted, though, but in the form of a new data frame with three columns: the month, the mean, and the standard deviation of all values from that month. I tried un-stacking v1 into a list of vectors and then looping over its elements, calculating the mean of each group: for (i in unstack(v1, v1$ARI_MIT ~ v1$MONTH)) { write(mean(i), ) } This works, but how do I get the data into a data frame? With the month labels in a column? They are not avaliable inside the loop body. I know I need to get a book on R. Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Applying a function to categorized data?
Hi all, I'm just getting started in R. My problem is the following: I have a data frame (v1) with lots of production data measurements. Each row contains a single measurement ('ARI_MIT') with a timestamp. I want to lump the data by months with their mean and standard deviation. I have already successfully managed to do the lumping by adding another column to my data frame: v1$MONTH = strftime(v1$TIMESTAMP, %y%m) This makes a nice month-wise boxplot of my data, although I don't have an idea why: boxplot(v1$ARI_MIT ~ v1$MONTH) I don't need this plotted, though, but in the form of a new data frame with three columns: the month, the mean, and the standard deviation of all values from that month. I tried un-stacking v1 into a list of vectors and then looping over its elements, calculating the mean of each group: for (i in unstack(v1, v1$ARI_MIT ~ v1$MONTH)) { write(mean(i), ) } This works, but how do I get the data into a data frame? With the month labels in a column? They are not avaliable inside the loop body. I know I need to get a book on R. Thanks, robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.