Re: [R] change value in one cell
That's great! But, I guess I should have provided a better example for my problem. The reason being that I did not consider strings. For instance, in my case, I'm actually trying to add a value to a data frame that looks like this: Obs X 0 NA 1 01 001 2 01 002 3 01 003 And I am actually trying to get that specific NA value and convert it to 00 000 so it can look like this: Obs X 0 00 000 1 01 001 2 01 002 3 01 003 When I write the code you provided, I get this result: x[1,X]-00 000 Warning message: In `[-.factor`(`*tmp*`, iseq, value = c(NA, 3L, 4L, 5L, 6L, 7L, : invalid factor level, NAs generated I am wondering what I am doing wrong and how to solve this problem. Thanks for all the help! Carlos On 6/10/2011 8:34 PM, jholtman [via R] wrote: you probably need to read the Introduction to R to understand indexing: x Obs X Y Z 1 1 1 0 1 2 2 0 0 1 3 3 1 1 1 4 4 0 1 1 5 5 1 1 0 6 6 0 0 0 x[4, Y] - 0 x Obs X Y Z 1 1 1 0 1 2 2 0 0 1 3 3 1 1 1 4 4 0 0 1 5 5 1 1 0 6 6 0 0 0 On Fri, Jun 10, 2011 at 5:42 PM, jour4life [hidden email] /user/SendEmail.jtp?type=nodenode=3589793i=0 wrote: Hello all, I am wondering if there is a way to change the value of one cell in R. For instance let's say I have a hypothetical data frame that looks like this: Obs X Y Z 11 0 1 20 0 1 31 1 1 40 1 1 51 1 0 60 0 0 I would like to change the value of the 4th observation in the Y column from 1 to 0. It should look like this: Obs X Y Z 11 0 1 20 0 1 31 1 1 40 0 1 51 1 0 60 0 0 Is it possible to change the value in one command? Thanks, Carlos -- View this message in context: http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589456.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] /user/SendEmail.jtp?type=nodenode=3589793i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ [hidden email] /user/SendEmail.jtp?type=nodenode=3589793i=2 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589793.html To unsubscribe from change value in one cell, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3589456code=am91cjRsaWZlQGdtYWlsLmNvbXwzNTg5NDU2fC0xNjIzNjcwNDM1. -- View this message in context: http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3590037.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory(RAM) issues
Thanks Anupam for your inputs. I believe there are two ways to circumvent the issue...1 making the code more efficient 1 Increasing the memory in some way.The reasons why I did not paste the code are 1 It is long and using quite a number of functions that I created 2 Secondly my intention is not to make the code more efficient if that's possible. Here landing into a memory problem with 2 GB RAM is natural as my analysis entails 1500 simulations from huge multivariate distributions that change after every simulation and tomorrow I may have to do similar analysis with 10 million observations * 20 columns. In view of above I shall be needing more memory sometime later and my IT friends are ready to support me for that(probably with a sandbox) but I am not sure whether I can install probably a servor version of R that can be capable of working with 8GB or so RAM. So it is more of technical help I need and I have no clue regarding the plausibility of the solution mentioned( i.e. a servor version of R that is capable of more memory). Regards, Abhisek On Sat, Jun 11, 2011 at 10:10 AM, Anupam anupa...@gmail.com wrote: It will be helpful on this forum to use metric measures: 12 Lakh is 1.2 million, thus your data is 1.2 million observations x 15 variables. I do not know the intricacies of R. You may have to wait for someone with that knowledge to respond. Including some relevant portions of error messages and code in your query can also help someone to respond to your message. Anupam. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Abhisek Saha Sent: Saturday, June 11, 2011 6:25 AM To: r-help@r-project.org Subject: [R] Memory(RAM) issues Dear All, I have been working with R(desktop version) in VISTA. I have the latest version R 2.13.0. I have been working with few data-sets of size 12Lakh * 15 and my code is quite computing intensive ( applying MCMC gibbs sampler on a posterior of 144 variables) that often runs into a memory issue such as memory can not allocate the vector ,full size(shows to have reached something like 1.5 GB) reached or something to this effect. I have a RAM of 2GB. I checked with the option like memory.size and it says a 64 bit R if sat on 64 bit windows then max memory capability is 8TB. Now I don't have background to understand the definitions and differences between 32 and 64 bit machines and other technical requirements like servor etc but it would be of great help if anyone can let me have a feel of it. Could any of you tell me whether some servor version of R would resolve my issue or not (I am not sure now what kind of servor my company would allow R to be installed at this point ,may be linux type) and if that's the case could any of you guide me about how to go about installing that onto a sevor. Thank you, Abhisek [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] change value in one cell
On 11/06/11 18:14, jour4life wrote: That's great! But, I guess I should have provided a better example for my problem. The reason being that I did not consider strings. For instance, in my case, I'm actually trying to add a value to a data frame that looks like this: Obs X 0 NA 1 01 001 2 01 002 3 01 003 And I am actually trying to get that specific NA value and convert it to 00 000 so it can look like this: Obs X 0 00 000 1 01 001 2 01 002 3 01 003 When I write the code you provided, I get this result: x[1,X]-00 000 Warning message: In `[-.factor`(`*tmp*`, iseq, value = c(NA, 3L, 4L, 5L, 6L, 7L, : invalid factor level, NAs generated I am wondering what I am doing wrong and how to solve this problem. Thanks for all the help! The error is clear enough, isn't it? 00 000 is not one of the levels of the X column of your data frame; this column is a factor. Probably because of the stringsAsFactors = TRUE default in options(). Since you appear to have been unaware of the factor nature of X, presumably you don't really want it to be a factor. If this is the case execute x[,X] - as.character(x[,X]) and then your reassignment of the [1,X] entry of x will work. If you do want X to be a factor you could: (a) execute x[,X] - factor(x[,X]) *after* doing the reassignment, or (b) execute levels(x[,X]) - c(00 000,levels(x[,X])) *before* doing the reassignment. Learn more about how R works. In particular learn about factors; they are important and useful. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] change value in one cell
your dataframe has X as a factor; convert to character vector and try again. x$X - as.character(x$X) Sent from my iPad On Jun 11, 2011, at 2:14, jour4life jour4l...@gmail.com wrote: That's great! But, I guess I should have provided a better example for my problem. The reason being that I did not consider strings. For instance, in my case, I'm actually trying to add a value to a data frame that looks like this: Obs X 0 NA 1 01 001 2 01 002 3 01 003 And I am actually trying to get that specific NA value and convert it to 00 000 so it can look like this: Obs X 0 00 000 1 01 001 2 01 002 3 01 003 When I write the code you provided, I get this result: x[1,X]-00 000 Warning message: In `[-.factor`(`*tmp*`, iseq, value = c(NA, 3L, 4L, 5L, 6L, 7L, : invalid factor level, NAs generated I am wondering what I am doing wrong and how to solve this problem. Thanks for all the help! Carlos On 6/10/2011 8:34 PM, jholtman [via R] wrote: you probably need to read the Introduction to R to understand indexing: x Obs X Y Z 1 1 1 0 1 2 2 0 0 1 3 3 1 1 1 4 4 0 1 1 5 5 1 1 0 6 6 0 0 0 x[4, Y] - 0 x Obs X Y Z 1 1 1 0 1 2 2 0 0 1 3 3 1 1 1 4 4 0 0 1 5 5 1 1 0 6 6 0 0 0 On Fri, Jun 10, 2011 at 5:42 PM, jour4life [hidden email] /user/SendEmail.jtp?type=nodenode=3589793i=0 wrote: Hello all, I am wondering if there is a way to change the value of one cell in R. For instance let's say I have a hypothetical data frame that looks like this: Obs X Y Z 11 0 1 20 0 1 31 1 1 40 1 1 51 1 0 60 0 0 I would like to change the value of the 4th observation in the Y column from 1 to 0. It should look like this: Obs X Y Z 11 0 1 20 0 1 31 1 1 40 0 1 51 1 0 60 0 0 Is it possible to change the value in one command? Thanks, Carlos -- View this message in context: http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589456.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] /user/SendEmail.jtp?type=nodenode=3589793i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ [hidden email] /user/SendEmail.jtp?type=nodenode=3589793i=2 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589793.html To unsubscribe from change value in one cell, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3589456code=am91cjRsaWZlQGdtYWlsLmNvbXwzNTg5NDU2fC0xNjIzNjcwNDM1. -- View this message in context: http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3590037.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rcspline.plot query
Dear all, As I am new to the R community - although eager to advance- I would like to pose a question to the community. I have an SPSS file which I have imported it in R (with the read.spss command) which conists of scale (continuous) variable adiponectin and the corresponding categorical value death (0=No, 1=Yes). In all there are 60 observations (among which there are 26 events). I am trying to do rcspline.plot (model=logisitc, nk=5, knots=NULL) however the error message that i get is fewer than 6 non missing observations. I have repeated the process with a similar dataset of 300 observations (instead of 60) and I still get the same message. Can anybody suggest a hint ? Kind regards Georgios Chalikias MD,PhD Senior Lecturer in Cardiology Democritus University of Thrace Alexandroupolis, Greece __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RES: Linear multivariate regression with Robust error
Date: Fri, 10 Jun 2011 16:50:24 -0300 From: filipe.bote...@vpar.com.br To: frien...@yorku.ca; bkkoc...@gmail.com CC: r-help@r-project.org Subject: [R] RES: Linear multivariate regression with Robust error --Forwarded Message Attachment-- Subject: RES: [R] Linear multivariate regression with Robust error Date: Fri, 10 Jun 2011 16:50:24 -0300 From: filipe.bote...@vpar.com.br To: frien...@yorku.ca; bkkoc...@gmail.com CC: r-help@r-project.org Hi Barjesh, I am not sure which data you analyze, but once I had a similar situation and it was a multicolinearity issue. I realized this after finding a huge correlation between the independent variables, then I dropped one of them and signs of slopes made sense. Beforehand, have a glance at your correlation matrix of independent variables to see the relationship between them. I guess look at the data ( LATFD ) seems to be the recurring solution. Certainly with polynomial regression, there's a tendency to think that the fit parameter is a pure property of the system being examined. With something like a Taylor series, and you have a slope at a specific point, maybe you could think about it that way but here the coefficient is just whatever optimizes your (arbitrary ) error function for the data you have. A linear approaximation to a non-line could be made at one point and maybe that property should remain constant but you have people claiming past authors samples some curve near x=a and got a different slope than my work largely smapling f(x) around x=b what is wrong with my result? It may be interesting to try to write a taylor series around some point and see how those coefficients vary with data sets for example( you still need arbitrary way to estimate slopes and simply differencing two points may be a bit noisy LOL but you could play with some wavelet families maybe ). If you try to describe a fruit as a linear combination of vegetables, you may get confusing but possibly useful results even if they don't correspond to properties of the fruits so much as a specific need you have. For example, if you are compressing images of fruits and your decoder already has a dictionary of vegetables, it may make sense to do this. This is not much different from trying to compress non-vocal music with an ACELP codec that attempts to fit the sounds to models of human vocal tract. Sometimes this may be even be informative about how a given sound was produced even if it sounds silly. Not sure how helpful my advice will be, but it did the trick for me back then ;) Cheers, Filipe Botelho -Mensagem original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Em nome de Michael Friendly Enviada em: sexta-feira, 10 de junho de 2011 12:51 Para: Barjesh Kochar Cc: r-help@r-project.org Assunto: Re: [R] Linear multivariate regression with Robust error On 6/10/2011 12:23 AM, Barjesh Kochar wrote: Dear all, i am doing linear regression with robust error to know the effect of a (x) variable on (y)other if i execute the command i found positive trend. But if i check the effect of number of (x.x1,x2,x3)variables on same (y)variable then the positive effect shwon by x variable turns to negative. so plz help me in this situation. Barjesh Kochar Research scholar You don't give any data or provide any code (as the posting guide requests) , so I have to guess that you have just rediscovered Simpson's paradox -- that the coefficient of a variable in a marginal regression can have an opposite sign to that in a joint model with other predictors. I have no idea what you mean by 'robust error'. One remedy is an added-variable plot which will show you the partial contributions of each predictor in the joint model, as well as whether there are any influential observations that are driving the estimated coefficients. -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This message and its attachments may contain confidential and/or privileged information. If you are not the addressee, please, advise the sender immediately by replying to the e-mail and delete this message. Este mensaje y sus anexos pueden contener información confidencial o privilegiada. Si ha recibido este e-mail por error por favor bórrelo y envíe un mensaje al remitente. Esta mensagem e seus anexos podem conter informação confidencial ou privilegiada. Caso não seja o destinatário, solicitamos a imediata notificação ao
Re: [R] rcspline.plot query
The error message is literally true, as you did not specify any data to the function at all. Please read the documentation. Also, note that rcspline.plot is largely replaced by the R rms package, although it may have value if there is only one predictor and that predictor is continuous. Also note spelling of logistic. You will need to load the rms package when using rcspline.plot anyway. When reading SPSS files you may find the Hmisc package's spss.get function useful. You've already loaded Hmisc if using rcspline.plot. Please follow the posting guide. Frank Georgios Chalikias wrote: Dear all, As I am new to the R community - although eager to advance- I would like to pose a question to the community. I have an SPSS file which I have imported it in R (with the read.spss command) which conists of scale (continuous) variable adiponectin and the corresponding categorical value death (0=No, 1=Yes). In all there are 60 observations (among which there are 26 events). I am trying to do rcspline.plot (model=logisitc, nk=5, knots=NULL) however the error message that i get is fewer than 6 non missing observations. I have repeated the process with a similar dataset of 300 observations (instead of 60) and I still get the same message. Can anybody suggest a hint ? Kind regards Georgios Chalikias MD,PhD Senior Lecturer in Cardiology Democritus University of Thrace Alexandroupolis, Greece __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/rcspline-plot-query-tp3590233p3590392.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Amazon AWS, RGenoud, Parallel Computing
Dear R group, since I do only have a moderately fast MacBook and I would like to get my results faster than within 72h per run ( :-((( ), I tried Amazon AWS, which offers pretty fast computers via remote access. I don't want to post any code because its several hundreds lines of code, I am not looking for the optimal answer, but maybe some suggestions from you if faced similar problems. I did install Revolution on Windows on the amazon instance. I usually go for the large one (8 cores, about 20 Ghz, several GB of RAM). - I am running a financial analysis over several periods (months) in which various CVaR calculations are made (with the rGenoud package). The periods do depend on each other, so parallelizing that does not work. I was quite surprised how well written all the libraries seem for R on Mac since they seem to use my dual core on the Macbook for a large portion of the calculations (I guess the matrix multiplications and the like). I was a little bit astonished though, that the performance increase on the Amazon instance (about 5 times faster than my Macbook) was very moderate with only about a 30% decrease in calculation time. The CPUs were about 60% in use (obviously, the code was not written specifically for several cores). (1) I did try to use multiple cores for the rGenoud package (snow package) as mentioned on the excellent website (http://sekhon.berkeley.edu/rgenoud/multiple_cpus.html) but found a rather strange behaviour: The CPU use on the Amazon instances would decrease to about 25% with periodic peaks. At least the first instance/optimization rum took significant longer (several times longer) than without explicitly including multicores in the genoud function. The number of cores I used was usually smaller than the number of cores I had at my service (4 of 8). So it does not seem like I am able to improve my performance here, even though I think it is somewhat strange... (2) I tried to improve the performance by parallelizing the solution quality functions (which are subject to minimization by rGenoud): One was basically a sorting algorithm (CVaR), the other one just a matrix multiplication sort of thing. Parallelizing either the composition of the solution function (which was the sum of the CVaR and matrix multiplication) or parallelizing the sort function (splitting up the dataset and later uniting subsets of the solution again) did not show any improvements: the performance was much worse - even though all 8 CPUs were 100% idle... I do think that it has to do with all the data management between the instances... I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size 2000, iterations 500), so I guess the problem with the housekeeping of the parallel computation deminishes all benefits. I tried snowfall (for #2) and the snow package (#1). I also tried the foreach library - but could get it working on windows... Suggestions with respect to operating system, Amazon AWS, or rgenoud are highly appreciated. Thanks a lot! Lui __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] To add cut-off points in surface response with lattice
Good morning gentlemen! I'm not a fan of the lattice due to a large number of procedures what should be done to reach a simple goal, but have confess that in some cases the graphics are way better than the graphics. Some days I have been searching without success as is to add a cut-off point on a graph of response surface. It is interesting that the researcher to look at the graph spotting cut-off point. Here is a minimal code reproducible. require(plotrix) jet.colors - colorRampPalette( c(blue, green) ) x - seq(-1.95, 1.95, length=30) y - seq(-1.95, 1.95, length=35) da - expand.grid(x=x, y=y) da$z - with(da, x*y^2) require(lattice) panel.3d.contour - function(x, y, z, rot.mat, distance, nlevels = 20, zlim.scaled, ...) { add.line - trellis.par.get(add.line) panel.3dwire(x, y, z, rot.mat, distance, zlim.scaled = zlim.scaled, ...) clines - contourLines(x, y, matrix(z, nrow = length(x), byrow = TRUE), nlevels = nlevels) for (ll in clines) { m - ltransform3dto3d(rbind(ll$x, ll$y, zlim.scaled[1]), rot.mat, distance) panel.lines(m[1,], m[2,], col = add.line$col, lty = add.line$lty, lwd = add.line$lwd) } } wireframe(z~x+y, da, drape=TRUE, scales=list(arrows=FALSE),col.regions=jet.colors(100),panel.3d.wireframe=panel.3d.contour) Tank' s! -- View this message in context: http://r.789695.n4.nabble.com/To-add-cut-off-points-in-surface-response-with-lattice-tp3590414p3590414.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory(RAM) issues
Hello Abhisek, maybe you wanna try it on just a bigger machine (I guess you are working at a university, so I hope you do have access to them). In case getting computing time or the like is a major issue, you may wanna try Amazon AWS: For a few rupees (about 50-100 per hour) you can rent pretty fast computers (20 Ghz, 8BG of RAM). You may want to try out the Windows version (little bit more expensive) which is easily accessible via remote desktop. Installing Revolution (which is free for academic purposes) (64 Bit Version) might give you a good start. Maybe its not a viable option in a long term view (pricing), but it might help to get a clue whether the problem can be solved on a bigger machine and just trying it out without a bigger hastle... Good luck! Lui On Sat, Jun 11, 2011 at 7:47 AM, Abhisek Saha tad...@gmail.com wrote: Thanks Anupam for your inputs. I believe there are two ways to circumvent the issue...1 making the code more efficient 1 Increasing the memory in some way.The reasons why I did not paste the code are 1 It is long and using quite a number of functions that I created 2 Secondly my intention is not to make the code more efficient if that's possible. Here landing into a memory problem with 2 GB RAM is natural as my analysis entails 1500 simulations from huge multivariate distributions that change after every simulation and tomorrow I may have to do similar analysis with 10 million observations * 20 columns. In view of above I shall be needing more memory sometime later and my IT friends are ready to support me for that(probably with a sandbox) but I am not sure whether I can install probably a servor version of R that can be capable of working with 8GB or so RAM. So it is more of technical help I need and I have no clue regarding the plausibility of the solution mentioned( i.e. a servor version of R that is capable of more memory). Regards, Abhisek On Sat, Jun 11, 2011 at 10:10 AM, Anupam anupa...@gmail.com wrote: It will be helpful on this forum to use metric measures: 12 Lakh is 1.2 million, thus your data is 1.2 million observations x 15 variables. I do not know the intricacies of R. You may have to wait for someone with that knowledge to respond. Including some relevant portions of error messages and code in your query can also help someone to respond to your message. Anupam. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Abhisek Saha Sent: Saturday, June 11, 2011 6:25 AM To: r-help@r-project.org Subject: [R] Memory(RAM) issues Dear All, I have been working with R(desktop version) in VISTA. I have the latest version R 2.13.0. I have been working with few data-sets of size 12Lakh * 15 and my code is quite computing intensive ( applying MCMC gibbs sampler on a posterior of 144 variables) that often runs into a memory issue such as memory can not allocate the vector ,full size(shows to have reached something like 1.5 GB) reached or something to this effect. I have a RAM of 2GB. I checked with the option like memory.size and it says a 64 bit R if sat on 64 bit windows then max memory capability is 8TB. Now I don't have background to understand the definitions and differences between 32 and 64 bit machines and other technical requirements like servor etc but it would be of great help if anyone can let me have a feel of it. Could any of you tell me whether some servor version of R would resolve my issue or not (I am not sure now what kind of servor my company would allow R to be installed at this point ,may be linux type) and if that's the case could any of you guide me about how to go about installing that onto a sevor. Thank you, Abhisek [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GLS model - diagnostic plots
I am applying a GLS model and would like to look at diagnostic plots of influence. The function (plot(model)) that works for linear models does not seem to function for GLS models. Is there a reason for this? Or is different code required? Sorry if it's a very basic question, but thanks for your help! The University of Aberdeen is a charity registered in Scotland, No SC013683. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot of tree
friends, I need some tool to visualize the results of hierarchical clustering. Specifically, I want to plot it as a radial plot as a phylogenetic tree. In addition, I want to specify the color to each leaf node. I search all phylogenetic tree plotting routines here, they all cannot show the leaf nodes and their colors. Since i will have a lot of leaf nodes, the color needs to be specified and given as a parameter, instead of manually. Can someone point me to some software packages? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot of hierarchical clustering results
friends, I need some tool to visualize the results of hierarchical clustering. Specifically, I want to plot it as a radial plot as a phylogenetic tree. In addition, I want to specify the color to each leaf node. I search all phylogenetic tree plotting routines here, they all cannot show the leaf nodes and their colors. Since i will have a lot of leaf nodes, the color needs to be specified and given as a parameter, instead of manually. Can someone point me to some software packages? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inspecting C code in an R package
Hello everyone, Trying to comprehend code of an R package, I encountered the problem that the interesting part of the function I'm inspecting is written in C-Code and called by .C(somename, ). Now I can't inspect the C-Code the function is calling since I can't find it in the folder of the package. Does someone know, where the corresponding C-Code ist stored, so I could inspect it and comprehend what is happening? Thank you very much in advance! Regards Roman -- View this message in context: http://r.789695.n4.nabble.com/Inspecting-C-code-in-an-R-package-tp3590596p3590596.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inspecting C code in an R package
The easiest thing to do is download the source package from your local CRAN mirror. That will contain all the R and other code. Sarah On Sat, Jun 11, 2011 at 10:50 AM, Layman123 romanhorn...@web.de wrote: Hello everyone, Trying to comprehend code of an R package, I encountered the problem that the interesting part of the function I'm inspecting is written in C-Code and called by .C(somename, ). Now I can't inspect the C-Code the function is calling since I can't find it in the folder of the package. Does someone know, where the corresponding C-Code ist stored, so I could inspect it and comprehend what is happening? Thank you very much in advance! Regards Roman -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inspecting C code in an R package
Thank you Sarah for your quick answer! I've just downloaded the source package, but now I don't know in which file the C-Code is stored .C(somename,) is calling - there is no file with the name somename. How could one figure that out? -- View this message in context: http://r.789695.n4.nabble.com/Inspecting-C-code-in-an-R-package-tp3590596p3590641.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inspecting C code in an R package
Presumably in the source code of the package (something.tar.gz). Follow the posting guidelines if you want further assistance. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Layman123 romanhorn...@web.de wrote: Hello everyone, Trying to comprehend code of an R package, I encountered the problem that the interesting part of the function I'm inspecting is written in C-Code and called by .C(somename, ). Now I can't inspect the C-Code the function is calling since I can't find it in the folder of the package. Does someone know, where the corresponding C-Code ist stored, so I could inspect it and comprehend what is happening? Thank you very much in advance! Regards Roman -- View this message in context: http://r.789695.n4.nabble.com/Inspecting-C-code-in-an-R-package-tp3590596p3590596.html Sent from the R help mailing list archive at Nabble.com. _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Amazon AWS, RGenoud, Parallel Computing
Date: Sat, 11 Jun 2011 13:03:10 +0200 From: lui.r.proj...@googlemail.com To: r-help@r-project.org Subject: [R] Amazon AWS, RGenoud, Parallel Computing Dear R group, [...] I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size 2000, iterations 500), so I guess the problem with the housekeeping of the parallel computation deminishes all benefits. Your sort is part of algorithm or you have to sort results after getting then back out of order from async processes? One of my favorite anecdotes is how I used a bash sort on huge data file to make program run faster ( from impractical zero percent CPU to very fast with full CPU usage and you complain about exactly a lack of CPU saturation). I guess a couple of comments. First, if you have specialized apps you need optimized, you may want to write dedicated c++ code. However, this won't help if you don't find the bottleneck. Lack of CPU saturation could easily be due to waiting for stuff like disk IO or VM swap. You really ought to find the bottle neck first, it could be anything ( except the CPU maybe LOL). The sort that I used prevented VM thrashing with no change to the app code- the app got sorted data and so VM paging became infrequent. If you can specify the problem precisely you may be able to find a simple solution. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] running R commands asynchronously
Good points, I was going to include something along the lines of don't change any data in the console that is involved in the asynchronous process, but figured Richard was smart enough to not do something like that. But your points add to that and make it clear for others who read this that don't have the experience to know better. -Original Message- From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk] Sent: Friday, June 10, 2011 11:36 PM To: Greg Snow Cc: Richard M. Heiberger; r-help Subject: Re: [R] running R commands asynchronously On Fri, 10 Jun 2011, Greg Snow wrote: Tk windows run asynchronous to the rest of R, so you could write a small function that lets you type the command into a Tk window and runs it while you continue to work in the main R, then the window could signal you when the function finishes. That is really not advised. R is not designed to run asynchronous R commands and is not protected against quite a lot of things which can happen if you try that (many low-level routines are not re-entrant, for example). Allowing callbacks to either run simple things (as happens from e.g. the GUI menus) whilst a task is running, or to do almost all the running (as happens with a menu system built in Tk) is fairly safe just because only one task is likely to run at a time. For over a decade Duncan TL has promised a facility to run tasks in parallel in R (as I recall the estimate at DSC 2001 was 12 months). So the only safe way at present (and the foreseeable future) is to run separate processes. Packages snow and multicore provide means to do that (and to wait for and collect results): in the case of multicore the parallel tasks work on a copy of the current session and so you do come close to the appearance of running aysnchronous tasks. (By choosing to use Windows you exclude yourself from multicore.) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Richard M. Heiberger Sent: Friday, June 10, 2011 3:30 PM To: r-help Subject: [R] running R commands asynchronously I am interested in running R commands asynchronously. My first choice is in the same R session that I am currently in. Here, the goal would be to run something like RunAsynchSameSession(myfunction(), outputname.rda) Once RunAsynchSameSession had started myfunction(), RunAsynchSameSession would complete immediately. myfunction would keep going. It is OK if execution of the myfunction() command prevents new input to R until it has completed. The important feature is that RunAsynchSameSession must tell the progam that called it that it was done. Second choice is to start an independent process, BATCH or something similar, and save the resulting data objects in an .rda file. RunAsynchBatch(myfile.r, outputname.rda) The RunAsynchBatch would start a batch process and complete immediately after starting the batch file. The batch file would run independently until it was completed. While I know how to do this, for example with system(wait=FALSE), I would appreciate seeing a worked collection of statements, including getting outputname.rda back to the original R session. I am working on Windows. Rich -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inspecting C code in an R package
Thank you! Of course, I will read the posting guidelines. A subscriber helped me via e-mail telling me to use the grep-command, that is type in: grep somename *.c. For Windows users it's: findstr somename *.c. -- View this message in context: http://r.789695.n4.nabble.com/Inspecting-C-code-in-an-R-package-tp3590596p3590744.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot of tree
On Saturday, June 11, 2011 04:26:44 AM Shuiwang Ji wrote: friends, I need some tool to visualize the results of hierarchical clustering. Specifically, I want to plot it as a radial plot as a phylogenetic tree. In addition, I want to specify the color to each leaf node. I search all phylogenetic tree plotting routines here, they all cannot show the leaf nodes and their colors. Since i will have a lot of leaf nodes, the color needs to be specified and given as a parameter, instead of manually. Can someone point me to some software packages? You might want to search here: http://addictedtor.free.fr/graphiques/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inspecting C code in an R package
On Sat, Jun 11, 2011 at 5:31 PM, Layman123 romanhorn...@web.de wrote: Thank you! Of course, I will read the posting guidelines. A subscriber helped me via e-mail telling me to use the grep-command, that is type in: grep somename *.c. For Windows users it's: findstr somename *.c. The problem here is that this will also find every time that function is called in the C files. And also anytime something called somename2 is mentioned. Most decent programmers editors will parse code files and construct a little database of where all your functions are defined so you can quickly jump to the definition of a function. In emacs this is done with a thing called 'etags'. But this is now beyond the scope of R-help... Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] running R commands asynchronously
On 06/10/2011 02:29 PM, Richard M. Heiberger wrote: I am interested in running R commands asynchronously. My first choice is in the same R session that I am currently in. Here, the goal would be to run something like RunAsynchSameSession(myfunction(), outputname.rda) Once RunAsynchSameSession had started myfunction(), RunAsynchSameSession would complete immediately. myfunction would keep going. It is OK if execution of the myfunction() command prevents new input to R until it has completed. The important feature is that RunAsynchSameSession must tell the progam that called it that it was done. somewhere in-between, starting a new R session but keeping the communication within the original. library(snow) cl - makeSOCKcluster(localhost) sendCall(cl[[1]], function(n) { Sys.sleep(n); n }, list(n=5)) ## do other things here... recvResult(cl[[1]]) ## blocks until result available I had hoped that I could open a non-blocking connection on the master (blocking=FALSE in the next-to-last line of newSOCKnode) and then poll with isIncomplete(cl[[1]]$con) but I couldn't get this to work (the connection, or more likely my understanding, seemed to be blocking anyway). sendCall(cl[[1]], function(n) { Sys.sleep(n); n }, list(n=5)) while(isIncomplete(cl[[1]]$con)) cat(tick\n) ## not printed recvResult(cl[[1]]) This approach might also work with the multicore package. Martin Second choice is to start an independent process, BATCH or something similar, and save the resulting data objects in an .rda file. RunAsynchBatch(myfile.r, outputname.rda) The RunAsynchBatch would start a batch process and complete immediately after starting the batch file. The batch file would run independently until it was completed. While I know how to do this, for example with system(wait=FALSE), I would appreciate seeing a worked collection of statements, including getting outputname.rda back to the original R session. I am working on Windows. Rich [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there an implementation loess with more than 4 parametric predictors or a trick to similar effect?
I have revised my earlier question to and would be grateful for any comments! -- Subject: Is there an implementation of loess with more than 3 parametric predictors or a trick to a similar effect? Dear R experts, I have a problem that is a related to the question raised in this earlier post https://stat.ethz.ch/pipermail/r-help/2007-January/124064.html My situation is different in that I have only 2 predictors (coordinates x,y) for local regression but a number of global (parametric) offsets that I need to consider. Essentially, I have a spatial distortion overlaid over a number of measurements. These measurements can be grouped by the same underlying undistorted measurement value for each group. The groups are known, but the values are not. We need to estimate the spatial trend, which we then want to remove. In our application, the spatial trend is two-dimensional (x,y), and there are about 20 groups of about 50 measurements each, in the most simple scenario. The measurements are randomly placed. Taking the first group as reference, there are thus 19 unknown offsets. The below code for toy data (spatial trend in one dimension x) works for two or three offset groups. Unfortunately, the loess call fails for four or more offset groups with the error message Error in simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, : only 1-4 predictors are allowed I tried overriding the restriction and got kd2MAX in ehg136. Need to recompile with increased dimensions. How easy would that be to do? I cannot find a definition of d2MAX anywhere, and it seems this might be hardcoded -- the error is apparently triggered by line #1359 in loessf.f if(k .gt. 15) call ehg182(105) Alternatively, does anyone know of an implementation of local regression with global (parametric) offset groups that could be applied here? Or is there a better way of dealing with this? I tried lme with correlation structures but that seems to be much, much slower. Any comments would be greatly appreciated! Best regards, David Kreil. ### # # loess with parametric offsets - toy data demo # x-seq(0,9,.1); x.N-length(x); o-c(0.4,-0.8,1.2#,-0.2 # works for three but not four ); # these are the (unknown) offsets o.N-length(o); f-sapply(seq(o.N), function(n){ ifelse((seq(x.N)= n *x.N/(o.N+1) seq(x.N) (n-1)*x.N/(o.N+1)), 1,0); }); f-f[sample(NROW(f)),]; y-sin(x)+rnorm(length(x),0,.1)+f%*%o; s.fs-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')}); s-paste(c('y~x',s.fs),collapse='+'); d-data.frame(x,y,f) names(d)-c('x','y',s.fs); l-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d, span=0.4); yp-predict(l,newdata=d); plot(x,y,pch='+',ylim=c(-3,3),col='red'); # input data points(x,yp,pch='o',col='blue'); # fit of that d0-d; d0$f1-d0$f2-d0$f3-0; yp0-predict(l,newdata=d0); points(x,y-f%*%o); # spatial distortion lines(x,yp0,pch='+'); # estimate of that op-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]}); cat(Demo offsets:,o,\n); cat(Estimated offsets:,format(op,digits=1),\n); [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Amazon AWS, RGenoud, Parallel Computing
Hello Mike, thank you very much for your response! Best to my knowledge the sort algorithm implemented in R is already backed by C++ code and not natively written in R. Writing the code in C++ is not really an option either (i think rGenoud is also written in C++). I am not sure whether there really is a bottleneck with respect to the computer - I/O is pretty low, plenty of RAM left etc. It really seems to me as if parallelizing is not easily possible or only at high costs so that the benefits diminish through all the coordination and handling needed... Did anybody use rGenoud in cluster mode an experience sth similar? Are quicksort packages available using multiple processors efficiently (I didnt find any... :-( ). I am by no means an expert on parallel processing, but is it possible, that benefits from parallelizing a process greatly diminish if a large set of variables/functions need to be made available and the actual function (in this case sorting a few hundred entries) is quite short whereas the number of times the function is called is very high!? It was quite striking that the first run usually took several hours (instead of half an hour) and the subsequent runs were much much faster.. There is so much happening behind the scenes that it is a little hard for me to tell what might help - and what will not... Help appreciated :-) Thank you Lui On Sat, Jun 11, 2011 at 4:42 PM, Mike Marchywka marchy...@hotmail.com wrote: Date: Sat, 11 Jun 2011 13:03:10 +0200 From: lui.r.proj...@googlemail.com To: r-help@r-project.org Subject: [R] Amazon AWS, RGenoud, Parallel Computing Dear R group, [...] I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size 2000, iterations 500), so I guess the problem with the housekeeping of the parallel computation deminishes all benefits. Your sort is part of algorithm or you have to sort results after getting then back out of order from async processes? One of my favorite anecdotes is how I used a bash sort on huge data file to make program run faster ( from impractical zero percent CPU to very fast with full CPU usage and you complain about exactly a lack of CPU saturation). I guess a couple of comments. First, if you have specialized apps you need optimized, you may want to write dedicated c++ code. However, this won't help if you don't find the bottleneck. Lack of CPU saturation could easily be due to waiting for stuff like disk IO or VM swap. You really ought to find the bottle neck first, it could be anything ( except the CPU maybe LOL). The sort that I used prevented VM thrashing with no change to the app code- the app got sorted data and so VM paging became infrequent. If you can specify the problem precisely you may be able to find a simple solution. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Median test and Coxon-Mann-Whittney test
Hi Rob, This list is primarily intended for questions about how to do things in R, so you're more likely to get a helpful response elsewhere. You might want to try some place like the Cross-Validated web site ( http://stats.stackexchange.com/) for general statistics and data analysis questions. Before you do that, you might want to clarify your question. It sounds like you ran a couple tests on a very small dataset, and they came up with different results. They're not even tests for the same thing, since the t-test is testing whether the means are different, and the Mann-Whitney is testing whether the medians are different. And in any case, it's hard to imagine coming up with any kind of useful inference on this tiny sample. When you post on Cross-Validated (or other site), people are going to want to know the problem you are trying to solve, which isn't at all clear at this point. Best, Ethan On Fri, Jun 10, 2011 at 12:26 PM, Robert Johnson robjoh...@gmail.comwrote: Hi All, I have the following dataset (triplicates values of 5 independent measurements and duplicate vaues of a control): 12 3 4 5 C 181.8 58.2 288.9 273.2290.953.9 120.3 116.8108.9 281.3 446 39.6 86.1 148.5 52.9 126 150.3 My aim was to find if mean values of Samples 1 - 5 are significantly higher than the mean value for C (control). At first, I carried out mean, SD and t-test (one-tail). Although SD error bars are large, two of the samples have mean values that are significantly higher than that of C. Second, I carried out median and Coxon-Mann-Whittney test because of my worry about the size of my data and the high variations in the replicates. I was surprised however, that median and Coxon-Mann-Whittney tests did not reveal statistical significant results. I will be happy if anyone could adivce me on the best way to analyse this dataset. Regards, Rob [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there an implementation of loess with more than 3 parametric predictors or a trick to a similar effect? [re-posting as plain text to pass char-set filter]
Dear R experts, I have a problem that is a related to the question raised in this earlier post https://stat.ethz.ch/pipermail/r-help/2007-January/124064.html My situation is different in that I have only 2 predictors (coordinates x,y) for local regression but a number of global (parametric) offsets that I need to consider. Essentially, I have a spatial distortion overlaid over a number of measurements. These measurements can be grouped by the same underlying undistorted measurement value for each group. The groups are known, but the values are not. We need to estimate the spatial trend, which we then want to remove. In our application, the spatial trend is two-dimensional (x,y), and there are about 20 groups of about 50 measurements each, in the most simple scenario. The measurements are randomly placed. Taking the first group as reference, there are thus 19 unknown offsets. The below code for toy data (spatial trend in one dimension x) works for two or three offset groups. Unfortunately, the loess call fails for four or more offset groups with the error message Error in simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, : only 1-4 predictors are allowed I tried overriding the restriction and got kd2MAX in ehg136. Need to recompile with increased dimensions. How easy would that be to do? I cannot find a definition of d2MAX anywhere, and it seems this might be hardcoded -- the error is apparently triggered by line #1359 in loessf.f if(k .gt. 15) call ehg182(105) Alternatively, does anyone know of an implementation of local regression with global (parametric) offset groups that could be applied here? Or is there a better way of dealing with this? I tried lme with correlation structures but that seems to be much, much slower. Any comments would be greatly appreciated! Best regards, David Kreil. ### # # loess with parametric offsets - toy data demo # x-seq(0,9,.1); x.N-length(x); o-c(0.4,-0.8,1.2#,-0.2 # works for three but not four ); # these are the (unknown) offsets o.N-length(o); f-sapply(seq(o.N), function(n){ ifelse((seq(x.N)= n *x.N/(o.N+1) seq(x.N) (n-1)*x.N/(o.N+1)), 1,0); }); f-f[sample(NROW(f)),]; y-sin(x)+rnorm(length(x),0,.1)+f%*%o; s.fs-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')}); s-paste(c('y~x',s.fs),collapse='+'); d-data.frame(x,y,f) names(d)-c('x','y',s.fs); l-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d, span=0.4); yp-predict(l,newdata=d); plot(x,y,pch='+',ylim=c(-3,3),col='red'); # input data points(x,yp,pch='o',col='blue'); # fit of that d0-d; d0$f1-d0$f2-d0$f3-0; yp0-predict(l,newdata=d0); points(x,y-f%*%o); # spatial distortion lines(x,yp0,pch='+'); # estimate of that op-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]}); cat(Demo offsets:,o,\n); cat(Estimated offsets:,format(op,digits=1),\n); __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Amazon AWS, RGenoud, Parallel Computing
Date: Sat, 11 Jun 2011 19:57:47 +0200 Subject: Re: [R] Amazon AWS, RGenoud, Parallel Computing From: lui.r.proj...@googlemail.com To: marchy...@hotmail.com CC: r-help@r-project.org Hello Mike, [[elided Hotmail spam]] Best to my knowledge the sort algorithm implemented in R is already backed by C++ code and not natively written in R. Writing the code in C++ is not really an option either (i think rGenoud is also written in C++). I am not sure whether there really is a bottleneck with respect to the computer - I/O is pretty low, plenty of RAM left etc. It really seems to me as if parallelizing is not easily possible or only at high costs so that the benefits diminish through all the coordination and handling needed... Did anybody use rGenoud in cluster mode an experience sth similar? Are quicksort packages available using multiple processors efficiently (I didnt find any... :-( ). I'm no expert but these don't seem to be terribly subtle problems in most cases. Sure, if the task is not suited to parallelism and you force it to be parallel and it spends all its time syncing up, that can be a problem. Just making more tasks to fight over the bottle neck- memory, CPU, locks- can easily make things worse. I think I posted my link earlier on IEEE blurb showing how easy it is for many cores to make things worse on non-contrived benchmarks. I am by no means an expert on parallel processing, but is it possible, that benefits from parallelizing a process greatly diminish if a large set of variables/functions need to be made available and the actual function (in this case sorting a few hundred entries) is quite short whereas the number of times the function is called is very high!? It was quite striking that the first run usually took several hours (instead of half an hour) and the subsequent runs were much much faster.. There is so much happening behind the scenes that it is a little hard for me to tell what might help - and what will not... Help appreciated :-) Thank you Lui On Sat, Jun 11, 2011 at 4:42 PM, Mike Marchywka wrote: Date: Sat, 11 Jun 2011 13:03:10 +0200 From: lui.r.proj...@googlemail.com To: r-help@r-project.org Subject: [R] Amazon AWS, RGenoud, Parallel Computing Dear R group, [...] I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size 2000, iterations 500), so I guess the problem with the housekeeping of the parallel computation deminishes all benefits. Your sort is part of algorithm or you have to sort results after getting then back out of order from async processes? One of my favorite anecdotes is how I used a bash sort on huge data file to make program run faster ( from impractical zero percent CPU to very fast with full CPU usage and you complain about exactly a lack of CPU saturation). I guess a couple of comments. First, if you have specialized apps you need optimized, you may want to write dedicated c++ code. However, this won't help if you don't find the bottleneck. Lack of CPU saturation could easily be due to waiting for stuff like disk IO or VM swap. You really ought to find the bottle neck first, it could be anything ( except the CPU maybe LOL). The sort that I used prevented VM thrashing with no change to the app code- the app got sorted data and so VM paging became infrequent. If you can specify the problem precisely you may be able to find a simple solution. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Factor Analysis/Inputting Correlation Matrix
Can someone please direct me to how to run a factor analysis in R by first inputting a correlation matrix? Does the function factanal allow one to read a correlation matrix instead of data vectors? Thanks, Matt. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factor Analysis/Inputting Correlation Matrix
Hi Matt, Did you try reading the documentation for factanal()? You can pull it up by typing: help(factanal) These give basically identical results using the raw data, the covariance matrix, and the correlation matrix. factanal(x = mtcars, factors = 3) factanal(factors = 3, covmat = cov(mtcars)) factanal(factors = 3, covmat = cor(mtcars)) all of which is clearly, explained if you read the details for the argument covmat in the help. Cheers, Josh On Sat, Jun 11, 2011 at 3:06 PM, Matt Stati mattst...@yahoo.com wrote: Can someone please direct me to how to run a factor analysis in R by first inputting a correlation matrix? Does the function factanal allow one to read a correlation matrix instead of data vectors? Thanks, Matt. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to compute the P-value for a mixture of chi^2 distributions
Hello, The test I am working on has an asymptotic 0.5*chi2(1)+0.5*chi2(2) distribution, where numbers inside parenthesis stand for the degrees of freedom. Is is possible to compute quickly in R the cumulative distribution of that distribution? Thanks in advance. Tiago -- View this message in context: http://r.789695.n4.nabble.com/How-to-compute-the-P-value-for-a-mixture-of-chi-2-distributions-tp3591276p3591276.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to compute the P-value for a mixture of chi^2 distributions
On Sun, Jun 12, 2011 at 12:44 PM, Tiago Pereira tiago.pere...@mbe.bio.br wrote: The test I am working on has an asymptotic 0.5*chi2(1)+0.5*chi2(2) distribution, where numbers inside parenthesis stand for the degrees of freedom. Is is possible to compute quickly in R the cumulative distribution of that distribution? There appear to be pchibar() functions in both the ibdreg and ic.infer packages that should do want you want. Simulation is also fairly efficient. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] smoothScatter function (color density question) and adding a legend
Dear R experts, I am resending my questions below one more time just in case someone out there could help but missed my email. I don't think my questions are too hard. I am most concerned about the transformation function. See below. Thanks, Clayton Hello, I have a few questions, regarding the smoothScatter function. I have a scatter plot with more than 500,000 data points for two samples. So, I am wanting to display the density in colors to convince people that my good correlation coefficient is not due to an influential point effect and plus, I also want to make my scatter plot look pretty. Anyway ... I have been able to make the plot, but I have a couple of questions about it. I also want to make a legend for it, which I don't know how to do. I have only been playing around with R for a few weeks now off and on. So, I am definitely a beginner. 1. I have 10 colors for my plot: white representing zero density and dark red representing the maximum density (I presume). According to the R documentation, the transformation argument represents the function mapping the density scale to the color scale. Note that in my R code below, transformation = function(x) x^0.14. I was wondering how exactly or mathematically that this function relates the density to color. I believe the colorRampPalette ramps colors between 0 and 1. I am not sure if x represents the color or the density. Since I have 10 colors, I suppose the colorRampPalette would assign values of 0, 0.11, 0.22, 0.33, 0.44, 0.55, 0.66, 0.77, 0.88, and 1 for white to dark red. I am not sure though. Does anyone know how this works? I am sure its not too too complicated. 2. In a related issue, I also would like to make a legend for this plot. Then, I would be able to see the transformation function's effects on the color-density relationship. Could someone help me in making a legend for my smoothScatter plot? I would like to place it immediately to the right of my plot as a vertical bar, matching the vertical length of the plot as is often convention. I really like the smoothScatter function. It is easy to use, and I believe it's relatively new. Thanks in advance. -Clayton clayton - c(white, lightblue, blue, darkgreen, green, yellow, orange, darkorange, red, darkred) x - read.table(c:/users/ccolling/log_u1m1.txt, sep =\t) smoothScatter(x$V8~x$V7, nbin=1000, colramp = colorRampPalette(c(clayton)), nrpoints=Inf, pch=, cex=.7, transformation = function(x) x^.14, col=black, main=M1 vs. M3 (r = 0.92), xlab=Normalized M1, ylab=Normalized M2, cex.axis = 1.75, cex.lab = 1.25 cex.main = 2) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to compute the P-value for a mixture of chi^2 distributions
thank you very much for your advice, Thomas! Extremely helpful! Cheers! Tiago -- View this message in context: http://r.789695.n4.nabble.com/How-to-compute-the-P-value-for-a-mixture-of-chi-2-distributions-tp3591276p3591365.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do Clustered Standard Errors for Regression in R?
Hi Bo, See the following blog post: http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/ http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/ where I calculate clustered standard errors in R. The result is the same as the cluster command in Stata. I had to figure this out to do some homework assignments for my econometrics class. Best, Kevin -- View this message in context: http://r.789695.n4.nabble.com/How-to-do-Clustered-Standard-Errors-for-Regression-in-R-tp867787p3591217.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Score Test Function
Greeting R Community, I'm trying to learn Logistic Regression on my own and am using An Introduction to Logistic Regression Analysis and Reporting (Peng, C., Lee, K., Ingersoll, G. ,2002). This article uses a Score Test Stat as a measure of overall fit for a logistic regression model. The author calculates this statistic using SAS. I am looking for an [R] function that can compute this stat and a p=value (please don't critique the stat itself). As far as I can tell glm.scoretest() from library(statmod) is the only function for this but it does not give a pvalue and appears to not correspond with the author's example. Some chat on the discussion board a while back indicated that this was a well known test Click here for that thread but difficult to calculate in [R]. I'm wondering if there is a way to do this fairly easily in [R] at this time? I am working with the data set from the study and am looking to get close to the 9.5177 score test stat and .0086 p-value with 2 degrees freedom. Below is the code for the data set and some descriptives so you can see the data set is the same as from the study. I highlighted my attempt to get a score test statistic and am curious if this is it (minus the p-value). #BEGINNING OF CODE id-factor(1:189) gender-factor(c(Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl, Boy,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl, Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl,Boy,Boy,Boy,Boy,Girl,Boy, Girl,Boy,Boy,Boy,Girl,Boy,Girl,Girl,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Girl, Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Boy, Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl, Girl,Girl,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Girl,Boy,Boy,Girl,Boy,Boy,Boy, Boy,Girl,Boy,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Boy,Boy,Girl)) reading.score-c(91.0,77.5,52.5,54.0,53.5,62.0,59.0,51.5,61.5,56.5,47.5,75.0,47.5,53.5,50.0,50.0,49.0,59.0,60.0,60.0, 60.5,50.0,101.0,60.0,60.0,83.5,61.0,75.0,84.0,56.5,56.5,45.0,60.5,77.5,62.5,70.0,69.0,62.0,107.5,54.5,92.5,94.5,65.0, 80.0,45.0,45.0,66.0,66.0,57.5,42.5,60.0,64.0,65.0,47.5,57.5,55.0,55.0,76.5,51.5,59.5,59.5,59.5,55.0,70.0,66.5,84.5, 57.5,125.0,70.5,79.0,56.0,75.0,57.5,56.0,67.5,114.5,70.0,67.0,60.5,95.0,65.5,85.0,55.0,63.5,61.5,60.0,52.5,65.0,87.5, 62.5,66.5,67.0,117.5,47.5,67.5,67.5,77.0,73.5,73.5,68.5,55.0,92.0,55.0,55.0,60.0,120.5,56.0,84.5,60.0,85.0,93.0,60.0, 65.0,58.5,85.0,67.0,67.5,65.0,60.0,47.5,79.0,80.0,57.5,64.5,65.0,60.0,85.0,60.0,58.0,61.5,60.0,65.0,93.5,52.5,42.5, 75.0,48.5,64.0,66.0,82.5,52.5,45.5,57.5,65.0,46.0,75.0,100.0,77.5,51.5,62.5,44.5,51.0,56.0,58.5,69.0,65.0,60.0,65.0, 65.0,40.0,55.0,52.5,54.5,74.0,55.0,60.5,50.0,48.0,51.0,55.0,93.5,61.0,52.5,57.5,60.0,71.0,65.0,60.0,55.0,60.0,77.0, 52.5,95.0,50.0,47.5,50.0,47.0,71.0,65.0) reading.recommendation-as.factor(c(rep(no,130),rep(yes,59))) DF-data.frame(id,gender,reading.score,reading.recommendation) head(DF) #= # DESCRIPTIVES #= table(DF[2:4]) #BREAKDOWN OF SCORES BY GENDER AND REMEDIAL READING RECOMENDATIONS table(DF[c(2)]) #TABLE OF GENDER print(table(DF[c(2)])/sum(table(DF[c(4)]))*100,digits=4)#PERCENT GENDER BREAKDOWN table(DF[c(4)]) #TABLE RECOMENDDED FOR REMEDIAL READING print(table(DF[c(4)])/sum(table(DF[c(4)]))*100,digits=4)#Probability of Reccomended table(DF[c(2,4)]) #TABLE OF GIRLS AND BOYS RECOMENDDED FOR REMEDIAL READING print(prop.table(table(DF[c(2,4)]),1)*100,digits=4)#Probability of Reccomended #= #ANALYSIS #= (mod1-glm(reading.recommendation~reading.score+gender,family=binomial,data=DF)) library(statmod) with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2 #If I move the decimal over 2 to the right I get close to the 9.5177 from the study (with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2)*100 #is this it? #END OF CODE I am running R 2.13.0 in a windows 7 machine. Peng, C., Lee, K., Ingersoll, G. (2002). An Introduction to Logistic Regression Analysis and Reporting. The Journal of Educational Research, 96(1), 3-14. Heldref Publications. Retrieved from http://www.informaworld.com/openurl?genre=articledoi=10.1080/00220670209598786magic=crossref [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
Re: [R] Score Test Function
The score test looks at the effect of adding extra columns to the model matrix. The function glm.scoretest takes the fitted model object as the first argument and the extra column, or columns, as the second argument. Your x2 argument has length only 3. Is this really what you want? I would have expected you need to specify a vector of length nrow(DF), [as in the help information for glm.scoretest itself]. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tyler Rinker Sent: Sunday, 12 June 2011 2:46 PM To: r-help@r-project.org Subject: [R] Score Test Function Greeting R Community, I'm trying to learn Logistic Regression on my own and am using An Introduction to Logistic Regression Analysis and Reporting (Peng, C., Lee, K., Ingersoll, G. ,2002). This article uses a Score Test Stat as a measure of overall fit for a logistic regression model. The author calculates this statistic using SAS. I am looking for an [R] function that can compute this stat and a p=value (please don't critique the stat itself). As far as I can tell glm.scoretest() from library(statmod) is the only function for this but it does not give a pvalue and appears to not correspond with the author's example. Some chat on the discussion board a while back indicated that this was a well known test Click here for that thread but difficult to calculate in [R]. I'm wondering if there is a way to do this fairly easily in [R] at this time? I am working with the data set from the study and am looking to get close to the 9.5177 score test stat and .0086 p-value with 2 degrees freedom. Below is the code for the data set and some descriptives so you can see the data set is the same as from the study. I highlighted my attempt to get a score test statistic and am curious if this is it (minus the p-value). #BEGINNING OF CODE id-factor(1:189) gender-factor(c(Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl, Boy,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl, Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl,Boy,Boy,Boy,Boy,Girl,Boy, Girl,Boy,Boy,Boy,Girl,Boy,Girl,Girl,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Girl, Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Boy, Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl, Girl,Girl,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Girl,Boy,Boy,Girl,Boy,Boy,Boy, Boy,Girl,Boy,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Boy,Boy,Girl)) reading.score-c(91.0,77.5,52.5,54.0,53.5,62.0,59.0,51.5,61.5,56.5,47.5,75.0,47.5,53.5,50.0,50.0,49.0,59.0,60.0,60.0, 60.5,50.0,101.0,60.0,60.0,83.5,61.0,75.0,84.0,56.5,56.5,45.0,60.5,77.5,62.5,70.0,69.0,62.0,107.5,54.5,92.5,94.5,65.0, 80.0,45.0,45.0,66.0,66.0,57.5,42.5,60.0,64.0,65.0,47.5,57.5,55.0,55.0,76.5,51.5,59.5,59.5,59.5,55.0,70.0,66.5,84.5, 57.5,125.0,70.5,79.0,56.0,75.0,57.5,56.0,67.5,114.5,70.0,67.0,60.5,95.0,65.5,85.0,55.0,63.5,61.5,60.0,52.5,65.0,87.5, 62.5,66.5,67.0,117.5,47.5,67.5,67.5,77.0,73.5,73.5,68.5,55.0,92.0,55.0,55.0,60.0,120.5,56.0,84.5,60.0,85.0,93.0,60.0, 65.0,58.5,85.0,67.0,67.5,65.0,60.0,47.5,79.0,80.0,57.5,64.5,65.0,60.0,85.0,60.0,58.0,61.5,60.0,65.0,93.5,52.5,42.5, 75.0,48.5,64.0,66.0,82.5,52.5,45.5,57.5,65.0,46.0,75.0,100.0,77.5,51.5,62.5,44.5,51.0,56.0,58.5,69.0,65.0,60.0,65.0, 65.0,40.0,55.0,52.5,54.5,74.0,55.0,60.5,50.0,48.0,51.0,55.0,93.5,61.0,52.5,57.5,60.0,71.0,65.0,60.0,55.0,60.0,77.0, 52.5,95.0,50.0,47.5,50.0,47.0,71.0,65.0) reading.recommendation-as.factor(c(rep(no,130),rep(yes,59))) DF-data.frame(id,gender,reading.score,reading.recommendation) head(DF) #= # DESCRIPTIVES #= table(DF[2:4]) #BREAKDOWN OF SCORES BY GENDER AND REMEDIAL READING RECOMENDATIONS table(DF[c(2)]) #TABLE OF GENDER print(table(DF[c(2)])/sum(table(DF[c(4)]))*100,digits=4)#PERCENT GENDER BREAKDOWN table(DF[c(4)]) #TABLE RECOMENDDED FOR REMEDIAL READING print(table(DF[c(4)])/sum(table(DF[c(4)]))*100,digits=4)#Probability of Reccomended table(DF[c(2,4)]) #TABLE OF GIRLS AND BOYS RECOMENDDED FOR REMEDIAL READING print(prop.table(table(DF[c(2,4)]),1)*100,digits=4)#Probability of Reccomended #= #ANALYSIS #= (mod1-glm(reading.recommendation~reading.score+gender,family=binomial,data=DF)) library(statmod) with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2 #If I move the decimal over 2 to the right