Re: [R] Using interactive plots to get information about data points
jcarmichael jcarmichael314 at gmail.com writes: I have been experimenting with interactive packages such iplots and playwith. Consider the following sample dataset: A B C D 1 5 5 9 3 2 8 4 1 7 3 0 7 2 2 6 Let's say I make a plot of variable A. I would like to be able to click on a data point (e.g. 3) and have a pop-up window tell me the corresponding value for variable D (e.g. 4). ?identify with labels argument. Another approach you might like to consider is to use GGobi (www.ggobi.org) with the rggobi package linking directly to it from R. GGobi is built specifically for this kind of interactive purpose. I am also trying to produce multiple small plots. For example, four side-by-side boxplots for each of the four variables A, B, C, D. ?par... eg par(mfrow=c(1,4)) (for base graphics). Hope this helps. Michael Bibo Queensland Health __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lost in the SNOW at 4 AM; parallelization confusion...
Apologies at what must be a very basic question, but I have not found any clear examples on how to design the following I would like to run iterative analysis over several processors. A toy example of the analysis is attached; for a resampling function run 1k times, with two different sets of conditioning variables i,j on some data vec... What is the usual way to attack such a problem using snow? My understanding up to this point is that one should: (1) set the random seed to uncorrelate the processors' actions in select() (2) make a function myfunc(vec,i,j) which returns the item of interest (3) set up a wrapper which iterates through i,j, and makes the call to the cluster (4) call the cluster using clusterApply(cl,vec, myfunc) I must be terribly confused based on the results attached belowany advice will be appreciated... Many thanks, Best, Eric -- Eric Rupley University of Michigan, Museum of Anthropology 1109 Geddes Ave, Rm. 4013 Ann Arbor, MI 48109-1079 [EMAIL PROTECTED] +1.734.276.8572 # set up # # cl - makeCluster(7) # 8 slaves are spawned successfully. 0 failed. #clusterSetupRNG(cl) #[1] RNGstream vec - runif(1000,1,100) d - NULL; c.j - NULL;c.i - NULL # the toy function analysis.func - function (vec,i,j) { b - NULL for (k in c(1:1000)) { a - sample(vec,1000,replace=T) #requires randoms... b - append(b, mean(a)) } c - (sum(b)*j)/i return(c) } # the analysis system.time(for (i in c(2,4)) { # a series of nested iterations... for (j in c(5:6)) { d - append( mean( as.numeric( clusterApply(cl,vec,analysis.func,i,j) ) ) , d) # this is ugly and contorted; there has to be a better way? c.j - append(j, c.j) c.i - append(i, c.i) } }) # user system elapsed # 9.758 0.291 48.771 # # but the old way is faster... d - NULL; c.j - NULL; c.i - NULL # set up again system.time(for (i in c(2,4)) { # a series of nested iterations... for (j in c(5:6)) { d -append( mean( as.numeric( analysis.func(vec,i,j) )) ,d) # keeping it ugly for timing comparision... c.j - append(j, c.j) c.i - append(i, c.i) } }) # user system elapsed # 0.299 0.002 0.299 # # arrgrgrgrgrg!!! stopCluster(cl) #[1] 1 sessionInfo() #R version 2.7.1 (2008-06-23) #i386-apple-darwin8.10.1 # #locale: #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 # #attached base packages: #[1] stats graphics grDevices utils datasets methods base # #other attached packages: #[1] rlecuyer_0.1 boot_1.2-33 snow_0.3-3 Rmpi_0.5-5 # #loaded via a namespace (and not attached): #[1] tools_2.7.1 date() #[1] Sat Aug 23 04:25:50 2008 # #Too late for a drink. Pity. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coordinate systems for geostatistics in R (imicola)
If you use the spatial objects provided by the sp-package (http://cran.r-project.org/web/packages/sp/vignettes/sp.pdf) you transform your data to other projections using the spTransform package. Thus you will need the rgdal package in complement (it actually includes spTransform). This function is extremely convenient: you can manage coordinate transformations extremely easily for common systems (WGS84, UTM) within the R environment. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using lme, how to specify: (1) repeated measures, and (2) Toeplitz covariance structure?
1) I think that the repeated measure is not years, but, as you said, the count of birds. If you are interested with the effect of the time variable (years), perhaps you need to introduce it as a fixed effect ? 2) See ?corARMA. 2008/8/22 mtb954 [EMAIL PROTECTED] We are attempting to use nlme to fit a linear mixed model to explain bird abundance as a function of habitat: lme(abundance~habitat-1,data=data,method=ML,random=~1|sampleunit) The data consist of repeated counts of birds in sample units across multiple years, and we have two questions: 1) Is it necessary (and, if so, how) to specify the repeated measure (years)? 2) How can we specify a Toeplitz heterogeneous covariance structure for this model? We have searched the help file for lme, and the R-help archives, but cannot find any pertinent information. Any help would be appreciated. Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple generation of artificial data with defined features
Dear Mr. Christos Hatzis, thank you so much for your answer which is in my eyes just brilliant! I followed it step by step (great and detailed explanation) and nearly everything is fine. - Except a problem in the very end, I haven't found a solution for until now. (Despite playing arround quite a lot...) Please let me explain: election.2005 - c(16194,13136,3494,3838,4648,4118) #cut of last 3 digits, cause my laptop can't handle millions of rows... attr(election.2005, class) - table attr(election.2005, dim) - c(1,6) attr(election.2005, dimnames) - list(c(votes), c(spd, cdu, csu, gruene, fdp, pds)) head(election.2005) spd cdu csu gruene fdp pds votes 16194 13136 3494 3838 4648 4118 el.dt - as.data.frame(election.2005) el.dt.exp - el.dt[rep(1:nrow(el.dt), el.dt$Freq), -ncol(el.dt)] dim(el.dt.exp) [1] 45428 2 head(el.dt.exp) Var1 Var2 1 votes spd 1.1 votes spd 1.2 votes spd 1.3 votes spd 1.4 votes spd 1.5 votes spd My problem now is, that I would need either an autoincrementing identifier instead of votes in Var1 or the possibility to access the numbering by a column name (i.e. Var0). In addition I need a 3rd Variable for the year oft the election (2005, which is the same for all, but needed later on). So this is what it should look like: voter.id party election.year 1 1spd2005 1.1 2 spd 2005 1.2 3spd 2005 1.3 4spd2005 1.4 5spd2005 1.5 6spd2005 The reason for that is the input format of the kappam.fleiss function of the irr package I use for calculation. It accepts a data.frame with the categories as rows (here we would have only one catgory: the year of the election) and the raters (here the voters) as columns. In the data.frame there will be the chosen party for each combination of electionyear and voter. This format can be easily achieved using the reshape package. Assuming voter.id would be an autoincrementing identifier, the command should be: library(reshape) el.dt.exp.molten-melt(el.dt.exp, id=c(voter.id)) #which would propably change not really anything in this case, because the data is already in a molten form kappa.frame-cast(el.dt.exp.molten, election.year ~ voter.id, subset=variable==party) I'd be extremely happy in case you might help me out again! Have a nice weekend and many thanks so far! Greetings from Munich, Felix Mueller-Sarnowski Christos Hatzis wrote: On the general question on how to create a dataset that matches the frequencies in a table, function as.data.frame can be useful. It takes as argument an object of a class 'table' and returns a data frame of frequencies. Consider for example table 6.1 of Fleiss et al (3rd Ed): birth.weight - c(10,15,40,135) attr(birth.weight, class) - table attr(birth.weight, dim) - c(2,2) attr(birth.weight, dimnames) - list(c(A, Ab), c(B, Bb)) birth.weight B Bb A 10 40 Ab 15 135 summary(birth.weight) Number of cases in table: 200 Number of factors: 2 Test for independence of all factors: Chisq = 3.429, df = 1, p-value = 0.06408 bw.dt - as.data.frame(birth.weight) Observations (rows) in this table can then be replicated according to their corresponding frequencies to yield the expanded dataset that conforms with the original table. bw.dt.exp - bw.dt[rep(1:nrow(bw.dt), bw.dt$Freq), -ncol(bw.dt)] dim(bw.dt.exp) [1] 200 2 table(bw.dt.exp) Var2 Var1 B Bb A 10 40 Ab 15 135 The above approach is not restricted to 2x2 tables, and should be straightforward generate datasets that conform to arbitrary nxm frequency tables. -Christos Hatzis -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Greg Snow Sent: Friday, August 22, 2008 12:41 PM To: drflxms; r-help@r-project.org Subject: Re: [R] simple generation of artificial data with defined features I don't think that the election data is the right data to demonstrate Kappa, you need subjects that are classified by 2 or more different raters/methods. The election data could be considered classifying the voters into which party they voted for, but you only have 1 rater. Maybe if you had some survey data that showed which party each voter voted for in 2 or more elections, then that may be a good example dataset. Otherwise you may want to stick with the sample datasets. There are other packages that compute Kappa values as well (I don't know if others calculate this particular version), but some of those take the summary data as input rather than the raw data, which may be easier if you just have the summary tables. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL
Re: [R] Sending ... to a C external
2008/8/22 Emmanuel Charpentier [EMAIL PROTECTED]: Le vendredi 22 août 2008 à 15:16 -0400, John Tillinghast a écrit : I'm trying to figure this out with Writing R Extensions but there's not a lot of detail on this issue. I want to write a (very simple really) C external that will be able to take ... as an argument. (It's for optimizing a function that may have several parameters besides the ones being optimized.) !!! That's a hard one. I have never undertaken this kind of job, but I expect that your ... argument, if you can reach it from C (which I don't know) will be bound to a Lisp-like structure, notoriously hard to decode in C. Basically, you'll have to create very low level code (an duplicate a good chunk of the R parser-interpreter...). I'd rather treat the ... argument in a wrapper that could call the relevant C function with all arguments interpreted and bound... This wrapper would probably be an order of magnitude slower than C code, but two orders of magnitude easier to write (and maintain !). Since ... argument parsing would be done *once* before the grunt work is accomplished by C code, the slowdown would (probably) be negligible... I think you're overstating the problem somewhat! Everything you need to process ... in C is pretty much in the showArgs function in the R-ext help. The problem is that John's not told us what his error was! It works for me: showArgs(x=2,y=3,z=4) [1] 'x' 2.00 [2] 'y' 3.00 [3] 'z' 4.00 NULL showArgs(x=2,y=3,z=s) [1] 'x' 2.00 [2] 'y' 3.00 [3] 'z' s NULL But I can get an error if I don't name an argument: showArgs(x=2,y=3,s) [1] 'x' 2.00 [2] 'y' 3.00 Error in showArgs(x = 2, y = 3, s) : CHAR() can only be applied to a 'CHARSXP', not a 'NULL' But that's just because the C doesn't check for this. Is that what you're getting? What errors are you getting? Barry [this was on an R 2.7.0 I had kicking around, so maybe changed for later versions...] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple generation of artificial data with defined features
Hi, to add voter.id and election.year to your data frame you could try: el.dt.exp$voter.id=seq(1:nrow(el.dt.exp)) el.dt.exp$election.year=2005 Cheers, Christoph Meyer *** Dr. Christoph Meyer Institute of Experimental Ecology University of Ulm Albert-Einstein-Allee 11 D-89069 Ulm Germany Phone: ++49-(0)731-502-2675 Fax:++49-(0)731-502-2683 Mobile: ++49-(0)1577-156-7049 E-mail: [EMAIL PROTECTED] http://www.uni-ulm.de/index.php?id=7885 *** Saturday, August 23, 2008, 1:25:05 PM, you wrote: Dear Mr. Christos Hatzis, thank you so much for your answer which is in my eyes just brilliant! I followed it step by step (great and detailed explanation) and nearly everything is fine. - Except a problem in the very end, I haven't found a solution for until now. (Despite playing arround quite a lot...) Please let me explain: election.2005 - c(16194,13136,3494,3838,4648,4118) #cut of last 3 digits, cause my laptop can't handle millions of rows... attr(election.2005, class) - table attr(election.2005, dim) - c(1,6) attr(election.2005, dimnames) - list(c(votes), c(spd, cdu, csu, gruene, fdp, pds)) head(election.2005) spd cdu csu gruene fdp pds votes 16194 13136 3494 3838 4648 4118 el.dt - as.data.frame(election.2005) el.dt.exp - el.dt[rep(1:nrow(el.dt), el.dt$Freq), -ncol(el.dt)] dim(el.dt.exp) [1] 45428 2 head(el.dt.exp) Var1 Var2 1 votes spd 1.1 votes spd 1.2 votes spd 1.3 votes spd 1.4 votes spd 1.5 votes spd My problem now is, that I would need either an autoincrementing identifier instead of votes in Var1 or the possibility to access the numbering by a column name (i.e. Var0). In addition I need a 3rd Variable for the year oft the election (2005, which is the same for all, but needed later on). So this is what it should look like: voter.id party election.year 1 1spd2005 1.1 2 spd 2005 1.2 3spd 2005 1.3 4spd2005 1.4 5spd2005 1.5 6spd2005 ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple generation of artificial data with defined features
Hello Mr. Greg Snow! Thank you very much for your prompt answer. I don't think that the election data is the right data to demonstrate Kappa, you need subjects that are classified by 2 or more different raters/methods. The election data could be considered classifying the voters into which party they voted for, but you only have 1 rater. I think, It should be possible to calculate kappa in case one has a little different point of view from the one you described above: Take the voters as raters who judge the category election with one party out of the six mentioned in my previous e-mail (which are simply the top six). This makes sense to me, because an election is somehow nothing else but a survey with the question who should lead our country - given six options in this example. As kappa is a measure of agreement, it should be able to illustrate the agreement of the voters answers to this question. For me this is - in priciple - no different from asking Where is the stenosis in the video of this endoscopy offering six options representing anatomic locations each. Otherwise you may want to stick with the sample datasets. The example data sets are of excellent quality and very interesting. I am sure there would be brilliant examples among them. But I have to admit that,t a I have no t a good overview of the available datasets at the moment (as a newbie). I just wanted to give an example out of every days life, everybody is familiar with. An election is something which came to my mind spontaneously. There are other packages that compute Kappa values as well (I don't know if others calculate this particular version), but some of those take the summary data as input rather than the raw data, which may be easier if you just have the summary tables. I chose Fleiss Kappa, because it is a more general form of Cohen's Kappa allowing m raters and n categories (instead of only two raters and to categories when using Cohen's kappa). Looking for another package calculating it from summary tables might be the simplest solution to my problem. Thank you very much for this hint! On the other hand it would be nice to use the very same method for the example as for the real data. The example will be part of the methods section. Thank you again very much for your tips and the quick reply. Have a nice weekend! Greetings from Munich, Felix Mueller-Sarnowski -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of drflxms Sent: Friday, August 22, 2008 6:12 AM To: r-help@r-project.org Subject: [R] simple generation of artificial data with defined features Dear R-colleagues, I am quite a newbie to R fighting my stupidity to solve a probably quite simple problem of generating artificial data with defined features. I am conducting a study of inter-observer-agreement in child-bronchoscopy. One of the most important measures is Kappa according to Fleiss, which is very comfortable available in R through the irr-package. Unfortunately medical doctors like me don't really understand much of statistics. Therefore I'd like to give the reader an easy understandable example of Fleiss-Kappa in the Methods part. To achieve this, I obtained a table with the results of the German election from 2005: partynumber of votespercent SPD1619466534,2 CDU1313674027,8 CSU34943097,4 Gruene38383268,1 FDP46481449,8 PDS41181948,7 I want to show the agreement of voters measured by Fleiss-Kappa. To calculate this with the kappam.fleiss-function of irr, I need a data.frame like this: (id of 1st voter) (id of 2nd voter) party spd cdu Of course I don't plan to calculate this with the million of cases mentioned in the table above (I am working on a small laptop). A division by 1000 would be more than perfect for this example. The exact format of the table is generally not so important, as I could reshape nearly every format with the help of the reshape-package. Unfortunately I could not figure out how to create such a fictive/artificial dataset as described above. Any data.frame would be nice, that keeps at least the percentage. String-IDs of parties could be substituted by numbers of course (would be even better for function kappam.fleiss in irr!). I would appreciate any kind of help very much indeed. Greetings from Munich, Felix Mueller-Sarnowski __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
[R] Error message in termplot
Hi I am trying to plot the following gam with termplot but keep getting the error message: Error in order(xx) : unimplemented type 'list' in 'orderVector1' Is there anyway I can rectify this to get my parametric coefficients plotted? Thanks Will Family: binomial Link function: logit Formula: fgha$pa ~ s(fgha$wspd) + fgha$depth + fgha$slha + fgha$hooks + wspd:hooks + wspd:hooks:depth Parametric coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -2.188e+00 4.331e-01 -5.051 4.39e-07 *** fgha$depth1.878e-04 5.275e-05 3.559 0.000372 *** fgha$slha-1.968e-02 7.899e-03 -2.492 0.012698 * fgha$hooks -3.155e-06 1.383e-06 -2.282 0.022500 * wspd:hooks1.351e-06 2.650e-07 5.099 3.41e-07 *** wspd:hooks:depth -1.959e-10 4.271e-11 -4.586 4.52e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Approximate significance of smooth terms: edf Ref.df Chi.sq p-value s(fgha$wspd) 8.732 9.232 23.87 0.00518 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R-sq.(adj) = 0.0991 Deviance explained = 8.85% UBRE score = 0.18451 Scale est. = 1 n = 1167 William Vincent 07979745433 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot facet: change layout of panels
Hi, is there anyway to adjust how ggplot(facet=) displays the layout of panels? I have a dataset with many 25 groups and gplot(y,x,facet= .~group) displays all 25 y~x plots next to each other so overall the plot is too wide. if i do the same plot in lattice xyploy(y~x|group) the y~x plots are arranged nicely 5 in each row to overall the plots is a nice 5 by 5 rectangular grid. Is there any way to adjust this in gplot? Thank you very much. Best, Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot facet: change layout of panels
Hi Tom, Not yet, but I'm working on it for the next version. Regards, Hadley On Sat, Aug 23, 2008 at 10:08 AM, Tom Boonen [EMAIL PROTECTED] wrote: Hi, is there anyway to adjust how ggplot(facet=) displays the layout of panels? I have a dataset with many 25 groups and gplot(y,x,facet= .~group) displays all 25 y~x plots next to each other so overall the plot is too wide. if i do the same plot in lattice xyploy(y~x|group) the y~x plots are arranged nicely 5 in each row to overall the plots is a nice 5 by 5 rectangular grid. Is there any way to adjust this in gplot? Thank you very much. Best, Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sending ... to a C external
On Fri, Aug 22, 2008 at 2:16 PM, John Tillinghast [EMAIL PROTECTED] wrote: I'm trying to figure this out with Writing R Extensions but there's not a lot of detail on this issue. I want to write a (very simple really) C external that will be able to take ... as an argument. (It's for optimizing a function that may have several parameters besides the ones being optimized.) I got the showArgs code (from R-exts) to compile and install, but it only gives me error messages when I run it. I think I'm supposed to pass it different arguments from what I'm doing, but I have no idea which ones. What exactly are CAR, CDR, and CADR anyway? Why did the R development team choose this very un-C-like set of commands? They are not explained much in R-exts. Emmanuel has answered that - very engagingly too. On the train from Dortmund to Dusseldorf last week I was describing exactly that etymology of the names CAR, CDR, CDDR, ... to my companions but I got it wrong. I had remembered CDR as contents of the data register but Emmanuel is correct that it was contents of the decrement register. I don't think it is necessary to go through the agonies of dealing with the argument list in a .External call., which is what I assume you are doing. (You haven't given us very much information on how you are trying to pass the ... argument and such information would be very helpful. The readers of this list are quite intelligent but none, as far as I know, have claimed to be telepathic.) The way that I would go about this is using .Call in something like .Call(myCfunction, arg1, arg2, dots = list(...), PACKAGE = myPackage) Then in your C code you check the length and the names of the dots argument and take appropriate action. An alternative, if you want to use the ... arguments in an R expression to be evaluated by your optimizer, is to create an environment, assign the elements of list(...) to the appropriate names in that environment and pass the environment through .Call to be used as the evaluation environment for your R expression. There is a somewhat complicated example of this in the nlmer function in the lme4 package which you can find at http://lme4.r-forge.r-project.org/. However, I don't feel embarrassed about the example being complicated. This is complex stuff and it is not surprising that it isn't completely straightforward to accomplish. If you feel that this is opaque in R i can hardly wait to see what you think about writing the SPSS version. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] graphs for pretest data
Is there an easy way to make graphs for the following data. I have pretest and posttest scores for men and women. I would like to form a 'titlted segment' plot for the data. That is, make segments joining the scores, with different types of segments for men and women. Example data: menpre - c(43,42,26,39,60,60,46) menpost - c(40,41,36,42,54,58,43) womenpre - c(46,56,81,56,70,70) womenpost - c(44,52,81,59,69,68) Thanks! Juliet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lost in the SNOW at 4 AM; parallelization confusion...
Hi Eric -- Eric Rupley [EMAIL PROTECTED] writes: Apologies at what must be a very basic question, but I have not found any clear examples on how to design the following I would like to run iterative analysis over several processors. A toy example of the analysis is attached; for a resampling function run 1k times, with two different sets of conditioning variables i,j on some data vec... What is the usual way to attack such a problem using snow? My understanding up to this point is that one should: (1) set the random seed to uncorrelate the processors' actions in select() (2) make a function myfunc(vec,i,j) which returns the item of interest (3) set up a wrapper which iterates through i,j, and makes the call to the cluster (4) call the cluster using clusterApply(cl,vec, myfunc) I think you're on the right track. You say: for (i in c(2,4)) { # a series of nested iterations... for (j in c(5:6)) { clusterApply(cl, vec, analysis.func, i, j) The clusterApply says, for each element of vec, invoke analysis.func. vec is of length 1000, so you invoke analysis.func 1000 times, and with the outer loops you're calling analysis func 2 * 2 * 1000 times. In your single processor code you have for (i in c(2,4)) { # a series of nested iterations... for (j in c(5:6)) { res - analysis.func(vec,i,j) which invokes analysis.func 2 * 2 times. A strategy is to convert your 'for' loops into an appropriate *apply function, which I might do as (approximately) its - expand.grid(i=c(2, 4), j=c(5, 6)) mapply(analysis.func, its$i, its$j, +MoreArgs=list(vec=vec)) [1] 120719.09 60403.20 144993.44 72468.66 (maybe you mean i=2:4, j=5:6 ?) and then to use the appropriate cluster* function, e.g., clusterMap(cl, analysis.func, its$i, its$j, +MoreArgs=list(vec=vec)) Maybe it is now early enough (though not too early?) for that drink? Martin I must be terribly confused based on the results attached belowany advice will be appreciated... Many thanks, Best, Eric -- Eric Rupley University of Michigan, Museum of Anthropology 1109 Geddes Ave, Rm. 4013 Ann Arbor, MI 48109-1079 [EMAIL PROTECTED] +1.734.276.8572 # set up # # cl - makeCluster(7) # 8 slaves are spawned successfully. 0 failed. #clusterSetupRNG(cl) #[1] RNGstream vec - runif(1000,1,100) d - NULL; c.j - NULL;c.i - NULL # the toy function analysis.func - function (vec,i,j) { b - NULL for (k in c(1:1000)) { a - sample(vec,1000,replace=T) #requires randoms... b - append(b, mean(a)) } c - (sum(b)*j)/i return(c) } # the analysis system.time(for (i in c(2,4)) { # a series of nested iterations... for (j in c(5:6)) { d - append( mean( as.numeric( clusterApply(cl,vec,analysis.func,i,j) ) ) , d) # this is ugly and contorted; there has to be a better way? c.j - append(j, c.j) c.i - append(i, c.i) } }) # user system elapsed # 9.758 0.291 48.771 # # but the old way is faster... d - NULL; c.j - NULL; c.i - NULL # set up again system.time(for (i in c(2,4)) { # a series of nested iterations... for (j in c(5:6)) { d -append( mean( as.numeric( analysis.func(vec,i,j) )) ,d) # keeping it ugly for timing comparision... c.j - append(j, c.j) c.i - append(i, c.i) } }) # user system elapsed # 0.299 0.002 0.299 # # arrgrgrgrgrg!!! stopCluster(cl) #[1] 1 sessionInfo() #R version 2.7.1 (2008-06-23) #i386-apple-darwin8.10.1 # #locale: #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 # #attached base packages: #[1] stats graphics grDevices utils datasets methods base # #other attached packages: #[1] rlecuyer_0.1 boot_1.2-33 snow_0.3-3 Rmpi_0.5-5 # #loaded via a namespace (and not attached): #[1] tools_2.7.1 date() #[1] Sat Aug 23 04:25:50 2008 # #Too late for a drink. Pity. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survey Design / Rake questions
On Fri, 22 Aug 2008, Farley, Robert wrote: I *think* I'm making progress, but I'm still failing at the same step. My rake call fails with: Error in postStratify.survey.design(design, strata[[i]], population.margins[[i]], : Stratifying variables don't match To my naïve eyes, it seems that my factors are in the wrong order. If so, how do I assert an ordering in my survey dataframe, or copy an image from the survey dataframe to my marginals dataframes? I'd prefer to pull the original marginals dataframe(s) from the survey dataframe so that I can automate that in production. It looks like a problem with the NumStn factor. One copy has been converted to character and then factor, giving levels in alphabetical order; the other copy has been converted directly to factor, giving levels in numerical order. If you use as.factor(1:12) rather than as.character(1:12) it should work. -thomas If that's not my problem, where might I look for enlightenment? Neither ?why nor ?whatamimissing return citations. :-) ** How I fail Now ** SurveyData - read.spss(C:/Data/R/orange_delivery.sav, use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE) #=== temp - sub(' +$', '', SurveyData$direction_) SurveyData$direction_ - temp #=== SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyData$lineoff)) mean(SurveyData$NumStn) [1] 6.785276 ### Kludge SurveyData$NumStn - pmax(1,SurveyData$NumStn) mean(SurveyData$NumStn) [1] 6.789877 SurveyData$NumStn - as.factor(SurveyData$NumStn) ### EBSurvey - subset(SurveyData, direction_ == EASTBOUND ) XTTable - xtabs(~direction_ , EBSurvey) XTTable direction_ EASTBOUND 345 WBSurvey - subset(SurveyData, direction_ == WESTBOUND ) XTTable - xtabs(~direction_ , WBSurvey) XTTable direction_ WESTBOUND 307 # EBDesign - svydesign(id=~sampn, weights=~expwgt, data=EBSurvey) # svytable(~lineon+lineoff, EBDesign) StnName - c( Warner Center, De Soto, Pierce College, Tampa, Reseda, Balboa, Woodley, Sepulveda, Van Nuys, Woodman, Valley College, Laurel Canyon, North Hollywood) EBOnNewTots - c(1000, 600, 1200, 500, 1000, 500, 200, 250, 1000, 300, 100, 123.65,0 ) StnTraveld - c(as.character(1:12)) EBNumStn- c(673.65, 800, 1000, 1000, 800, 700, 600, 500, 400, 200, 50, 50 ) ByEBOn - data.frame(StnName, Freq=EBOnNewTots) ByEBNum - data.frame(StnTraveld, Freq=EBNumStn) RakedEBSurvey - rake(EBDesign, list(~lineon, ~NumStn), list(ByEBOn, ByEBNum) ) Error in postStratify.survey.design(design, strata[[i]], population.margins[[i]], : Stratifying variables don't match str(EBSurvey$lineon) Factor w/ 13 levels Warner Center,..: 3 1 1 1 2 13 1 5 1 5 ... EBSurvey$lineon[1:5] [1] Pierce College Warner Center Warner Center Warner Center De Soto Levels: Warner Center De Soto Pierce College Tampa Reseda Balboa Woodley Sepulveda Van Nuys Woodman Valley College Laurel Canyon North Hollywood str(ByEBOn$StnName) Factor w/ 13 levels Balboa,De Soto,..: 11 2 5 8 6 1 12 7 10 13 ... ByEBOn$StnName[1:5] [1] Warner Center De SotoPierce College Tampa Reseda Levels: Balboa De Soto Laurel Canyon North Hollywood Pierce College Reseda Sepulveda Tampa Valley College Van Nuys Warner Center Woodley Woodman str(EBSurvey$NumStn) Factor w/ 12 levels 1,2,3,4,..: 10 12 4 12 8 1 8 8 12 4 ... EBSurvey$NumStn[1:5] [1] 10 12 4 12 8 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 str(ByEBNum$StnTraveld) Factor w/ 12 levels 1,10,11,..: 1 5 6 7 8 9 10 11 12 2 ... ByEBNum$StnTraveld[1:5] [1] 1 2 3 4 5 Levels: 1 10 11 12 2 3 4 5 6 7 8 9 ** ** Robert Farley Metro www.Metro.net -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Thursday, August 21, 2008 13:55 To: Farley, Robert Cc: r-help@r-project.org Subject: Re: [R] Survey Design / Rake questions On Tue, 19 Aug 2008, Farley, Robert wrote: While I'm trying to catch up on the statistical basis of my task, could someone point me to how I should fix my R error? The variables in the formula in rake() need to be the raw variables in the design object, not summary tables. -thomas Thanks library(survey) SurveyData - read.spss(C:/Data/R/orange_delivery.sav, use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE) #=== temp - sub(' +$', '',
Re: [R] graphs for pretest data
?plot ?lines Something like this perhaps plot( menpre, type=l, col=red) lines(menpost, col=blue) lines(womenpre,col=green lines(womenpost, col= orange) also have a look at ?par for various options --- On Sat, 8/23/08, Juliet Hannah [EMAIL PROTECTED] wrote: From: Juliet Hannah [EMAIL PROTECTED] Subject: [R] graphs for pretest data To: r-help@r-project.org Received: Saturday, August 23, 2008, 12:04 PM Is there an easy way to make graphs for the following data. I have pretest and posttest scores for men and women. I would like to form a 'titlted segment' plot for the data. That is, make segments joining the scores, with different types of segments for men and women. Example data: menpre - c(43,42,26,39,60,60,46) menpost - c(40,41,36,42,54,58,43) womenpre - c(46,56,81,56,70,70) womenpost - c(44,52,81,59,69,68) Thanks! Juliet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [[elided Yahoo spam]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Forthcoming R Conferences
Dear useRs and developeRs, I hope all attending useR! in Dortmund last week had as much a good time as I had and a safe trip home. This email is to announce our plans for forthcoming conferences. In 2009 there will be a useR! in Rennes, France (July 8-10), directly followed by a DSC in Copenhagen, Denmark (July 13-14). We would like to have a useR! 2010 in North America, and 2011 in Europe. Locations for 2010 and 2011 have not been fixed yet, but there are already some plans. Proposals to host a conference are of course more than welcome. There have also been questions why 2009 is again in Europe. The main reasons were: a) we had a very good offer from Rennes, and (and at least almost one year ago, when planning started) none from outside Europe b) Having European useR!s in odd years will make it easier to set a date: in even years there are the biannual Compstat conferences, which together with all their satellite meetings block mid-August to the beginning of September. In odd years on the other hand, there is no regular big conference on computational statistics in Europe. So the basic plan is to be in Europe in odd years, and North America in even years from now on. Either continent could of course be replaced by one of the other 5 continents if we get a good offer. Antarctica may be hard to get to, though ;-) On behalf of the R Foundation, Fritz Leisch -- --- Prof. Dr. Friedrich Leisch Institut für Statistik Tel: (+49 89) 2180 3165 Ludwig-Maximilians-Universität Fax: (+49 89) 2180 5308 Ludwigstraße 33 D-80539 München http://www.statistik.lmu.de/~leisch --- Journal Computational Statistics --- http://www.springer.com/180 Münchner R Kurse --- http://www.statistik.lmu.de/R ___ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-announce __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphs for pretest data
Dear Juliet, Perhaps start here: require(lattice) mwpp - data.frame(y = c(43,42,26,39,60,60,46,40,41,36,42,54, 58,43,46,56,81,56,70,70,44,52,81,59,69,68), sex = rep(c(rep('men', 14), rep('women', 12))), pp = c(rep(c('pre', 'post'), each = 7), rep(c('pre', 'post'), each = 6)), sub = c(1:7, 1:7, 8:13, 8:13)) xyplot(y ~ pp | sex, groups = sub, type = 'b', mwpp) _ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400Charlottesville, VA 22904-4400 Parcels:Room 102Gilmer Hall McCormick RoadCharlottesville, VA 22903 Office:B011+1-434-982-4729 Lab:B019+1-434-982-4751 Fax:+1-434-982-4766 WWW:http://www.people.virginia.edu/~mk9y/ On Aug 23, 2008, at 12:04 PM, Juliet Hannah wrote: Is there an easy way to make graphs for the following data. I have pretest and posttest scores for men and women. I would like to form a 'titlted segment' plot for the data. That is, make segments joining the scores, with different types of segments for men and women. Example data: menpre - c(43,42,26,39,60,60,46) menpost - c(40,41,36,42,54,58,43) womenpre - c(46,56,81,56,70,70) womenpost - c(44,52,81,59,69,68) Thanks! Juliet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphs for pretest data
On Sat, Aug 23, 2008 at 1:10 PM, Michael Kubovy [EMAIL PROTECTED] wrote: Dear Juliet, Perhaps start here: require(lattice) mwpp - data.frame(y = c(43,42,26,39,60,60,46,40,41,36,42,54, 58,43,46,56,81,56,70,70,44,52,81,59,69,68), sex = rep(c(rep('men', 14), rep('women', 12))), pp = c(rep(c('pre', 'post'), each = 7), rep(c('pre', 'post'), each = 6)), sub = c(1:7, 1:7, 8:13, 8:13)) Or in ggplot2: library(ggplot2) qplot(pp, y, data=mwpp, geom=c(point,line), group = sub, colour=sex) qplot(pp, y, data=mwpp, geom=c(point,line), group = sub, facets = . ~ sex) The key is to get your data into a data frame with variables that explicitly label the experimental units, as Michael did for you. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tinn-R keyboard problem
The right hand side of my keyboard (Enter, shift, arrows, etc.) just stopped working only when I am using Tinn-R. It works perfectly fine with any other application. To check if there was a problem with my keyboard I connected an external keyboard and the same keys did not work with that either. Is there anyone who had the same problem before and know the solution to this problem? Thanks, Sermin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.