Re: [R] how to get inflection point in binomial glm
René, Yes, to fit a re-parameterized logistic model I think you'd have to code the whole enchilada yourself, not relying on glm (but not nls() as nls() deals with least squares minimization whereas here we want to minimize a minus log binomial likelihood). I did that and have the re-parameterized logistic model in a package I wrote for a colleague (this package has the logistic fit fully functional and documented). My code though only considers one continuous predictor. If you want I may email you this package and you figure out how to deal with the categorical predictor. One thing I believe at this point is that you'd have to do the inference on the continuous predictor _conditional_ on certain level(s) of the categorical predictor. Rubén -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de René Mayer Enviado el: jueves, 01 de diciembre de 2011 20:34 Para: David Winsemius CC: r-help Help Asunto: Re: [R] how to get inflection point in binomial glm Thanks David and Rubén! @David: indeed 15 betas I forgot the interaction terms, thanks for correcting! @Rubén: the re-parameterize would be done within nls()? how to do this practically with including the factor predictor? and yes, we can solve within each group for Y=0 getting 0=b0+b1*X |-b0 -b0=b1*X |/b1 -b0/b1=X but I was hoping there might a more general solution for the case of multiple logistic regression. HTH René Zitat von David Winsemius dwinsem...@comcast.net: On Dec 1, 2011, at 8:24 AM, René Mayer wrote: Dear All, I have a binomial response with one continuous predictor (d) and one factor (g) (8 levels dummy-coded). glm(resp~d*g, data, family=binomial) Y=b0+b1*X1+b2*X2 ... b7*X7 Dear Dr Mayer; I think it might be a bit more complex than that. I think you should get 15 betas rather than 8. Have you done it? how can I get the inflection point per group, e.g., P(d)=.5 Wouldn't that just be at d=1/beta in each group? (Thinking, perhaps naively, in the case of X=X1 that (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta *d*(X==X1) ) # all other terms = 0 And taking the log of both sides, and then use middle school math to solve. Oh, wait. Muffed my first try on that for sure. Need to add back both the constant intercept and the baseline d coefficient for the non-b0 levels. (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta_0 + beta_d_0*d + beta_n + beta_d_n *d*(X==Xn) ) And just (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta_0 + beta_d_0*d ) # for the reference level. This felt like an exam question in my categorical analysis course 25 years ago. (Might have gotten partial credit for my first stab, depending on how forgiving the TA was that night.) I would be grateful for any help. Thanks in advance, René -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R2Cuba package, failed with message ‘Dimension out of range’
Hi Sachin, In this mail there is not enough context to provide you with advice. Please read the posting guide: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. regards, Paul On 12/02/2011 05:09 AM, Sachinthaka Abeywardana wrote: Hi All, I get the message failed with message Dimension out of range when using cuhre in package R2Cuba. Does anyone know what this mean? Or would I need to email the package author? The funny thing is it does give a result and comparing it to adaptIntegrate in package cubature, the two numbers are very close. Thanks, Sachin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help in dbWriteTable
Hello, you can fetch the column names of a table with dbListFields and then reorder or rename the data frame according to those. If you want more specific help, provide an example (RSQLite would be a good choice as database engine to make it easily reproducible for others). Best regards, Andreas arunkumar schrieb: hi I need some help in dbWriteTable. I'm not able to insert the rows in the table if the column order are not same in the database and in the dataframe which i'm inserting. Also facing issue if the table is already created externally and inserting it thru dbWrite. is there some way that we can sepecify the rownames in the dbwrite..or any method which will solve my problem -- View this message in context: http://r.789695.n4.nabble.com/help-in-dbWriteTable-tp4145110p4145110.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andreas Borg Abteilung Medizinische Informatik Universitätsmedizin der Johannes Gutenberg-Universität Mainz Institut für Med. Biometrie, Epidemiologie und Informatik (IMBEI) Obere Zahlbacher Straße 69, 55131 Mainz Tel: +49 (0) 6131 17-5062 E-Mail: andreas.b...@uni-mainz.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem subsetting: undefined columns
Dear R-users, -I am new to R, and I am struggling with the following problem. -I am repeating the following operations hundreds of times, within a loop: I want to subset a data frame by columns. I am interested in the columns names that are given by the rows of another data frame that was built in parallel. The solution I have so far works well as long as the elements of the second data frame are included in the column names of the first data frame but if an element from the second object is not a column name of the first one, then it bugs. -More concretely, I have the following data frames d and v: mmdd-c(19720601, 19720602, 19720605) sret.10006-c(1,2,3) sret.10014-c(5,9,7) sret.10065-c(10,2,11) d- data.frame(mmdd=mmdd, sret.10006=sret.10006, sret.10014=sret.10014, sret.10065=sret.10065) v- data.frame(V1=sret.10006, V2=sret.10090) v- sapply(v, function(x) levels(x)[x]) -I want to do the following subsetting: sub- subset(d, select=c(v)) and I get the following error message: Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected Any help would be very much appreciated, Best, Aurelien [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What's the baseline model when using coxph with factor variables?
William and David, thanks for your help. The contrasts option was indeed what I was looking for but didn't find. andi On 01.12.2011 20:56, David Winsemius wrote: On Dec 1, 2011, at 1:00 PM, William Dunlap wrote: Terry will correct me if I'm wrong, but I don't think the answer to this question is specific to the coxph function. It depends on our interpretation of the questioner's intent. My answer was predicated on the assumption that the phrase baseline model meant baseline survival function, ... S_0(t) in survival analysis notation. For all the [well-written] formula-based modelling functions (essentially, those that call model.frame and model.matrix to interpret the formula) the option contrasts controls how factor variables are parameterized in the model matrix. contr.treatment makes the baseline the first factor level, contr.SAS makes the baseline the last, contr.sum makes the baseline the mean, etc. E.g., df- data.frame(time=sin(1:20)+2, cens=rep(c(0,0,1), len=20), var1=factor(rep(0:1, each=10)), var2=factor(rep(0:1, 10))) options(contrasts=c(contr.treatment, contr.treatment)) coxph(Surv(time, cens) ~ var1 + var2, data=df) Call: coxph(formula = Surv(time, cens) ~ var1 + var2, data = df) coef exp(coef) se(coef) zp var11 0.1640 1.180.822 0.1995 0.84 var21 0.0806 1.080.830 0.0971 0.92 Likelihood ratio test=0.05 on 2 df, p=0.974 n= 20, number of events= 6 options(contrasts=c(contr.SAS, contr.SAS)) coxph(Surv(time, cens) ~ var1 + var2, data=df) Call: coxph(formula = Surv(time, cens) ~ var1 + var2, data = df) coef exp(coef) se(coef) zp var10 -0.1640 0.8490.822 -0.1995 0.84 var20 -0.0806 0.9230.830 -0.0971 0.92 Likelihood ratio test=0.05 on 2 df, p=0.974 n= 20, number of events= 6 options(contrasts=c(contr.sum, contr.sum)) coxph(Surv(time, cens) ~ var1 + var2, data=df) Call: coxph(formula = Surv(time, cens) ~ var1 + var2, data = df) coef exp(coef) se(coef) zp var11 -0.0820 0.9210.411 -0.1995 0.84 var21 -0.0403 0.9600.415 -0.0971 0.92 Likelihood ratio test=0.05 on 2 df, p=0.974 n= 20, number of events= 6 (lm() has a contrasts argument that can override getOption(contrasts) and set different contrasts for each variable but coxph() does not have that argument.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of David Winsemius Sent: Thursday, December 01, 2011 9:36 AM To: a.schlic...@nki.nl Cc: r-help@r-project.org Subject: Re: [R] What's the baseline model when using coxph with factor variables? On Dec 1, 2011, at 12:00 PM, Andreas Schlicker wrote: Hi all, I'm trying to fit a Cox regression model with two factor variables but have some problems with the interpretation of the results. Considering the following model, where var1 and var2 can assume value 0 and 1: coxph(Surv(time, cens) ~ factor(var1) * factor(var2), data=temp) What is the baseline model? Is that considering the whole population or the case when both var1 and var2 = 0? This has been discussed several times in the past on rhelp. My suggestion would be to search your favorite rhelp archive using baseline hazard Therneau, since Terry Therneau is the author of survival. (The answer is closer to the first than to the second.) Kind regards, andi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Resampling with replacement on a binary (0, 1) dataset to get Cis
Thanks. Anyway, it is not homework and I was not told to do that. My question has not been answered yet, I'll try to reformulate it: Does it make (statistical) sense to resample with replacement in this situation to get an estimate of the CIs? In case it does, how could I do it in R? Some further details on my real case study: 10 independent samples from a population in ten sessions. Each sample consists of a number (somehow variable) of random individuals that are classified as 0 or 1 depending on one specific state (presence or absence of a disease). I can calculate, for each session, the percentage of individuals diseased but I have nothing about the CIs, any suggestion? -- View this message in context: http://r.789695.n4.nabble.com/Resampling-with-replacement-on-a-binary-0-1-variable-to-get-CIs-tp4127990p4145733.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] export array
What is the best way to export 1 array?? the array i am trying to export has 3 dimensions (long,lat,observations) how can i export each dimension independently? e.g. one csv file with only the long __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] export array
What do you want to do with it after you export; that will probably define what the data format would look like. Why would you want each dimension separately? How would you correlate them later? Is it really 3 dimensions, or is your data just three columns where each row is long, lat and observation? A small subset of the data would be helpful. Are you going to read it back into R, or send it somewhere else? More information would be useful because you can create almost any output format that you want. Sent from my iPad On Dec 2, 2011, at 4:27, Ana rrast...@gmail.com wrote: What is the best way to export 1 array?? the array i am trying to export has 3 dimensions (long,lat,observations) how can i export each dimension independently? e.g. one csv file with only the long __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem subsetting: undefined columns
?try If you know that you might have a problem with undefined columns, or whatever, then trap the error with 'try' so your program can recover. You could also validate the data that you are going to use before entering the loop; standard defensive programming - errors are always going to happen, so guard against them. Sent from my iPad On Dec 2, 2011, at 2:20, Aurélien PHILIPPOT aurelien.philip...@gmail.com wrote: Dear R-users, -I am new to R, and I am struggling with the following problem. -I am repeating the following operations hundreds of times, within a loop: I want to subset a data frame by columns. I am interested in the columns names that are given by the rows of another data frame that was built in parallel. The solution I have so far works well as long as the elements of the second data frame are included in the column names of the first data frame but if an element from the second object is not a column name of the first one, then it bugs. -More concretely, I have the following data frames d and v: mmdd-c(19720601, 19720602, 19720605) sret.10006-c(1,2,3) sret.10014-c(5,9,7) sret.10065-c(10,2,11) d- data.frame(mmdd=mmdd, sret.10006=sret.10006, sret.10014=sret.10014, sret.10065=sret.10065) v- data.frame(V1=sret.10006, V2=sret.10090) v- sapply(v, function(x) levels(x)[x]) -I want to do the following subsetting: sub- subset(d, select=c(v)) and I get the following error message: Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected Any help would be very much appreciated, Best, Aurelien [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem subsetting: undefined columns
On 12/02/2011 07:20 AM, Aurélien PHILIPPOT wrote: Dear R-users, -I am new to R, and I am struggling with the following problem. -I am repeating the following operations hundreds of times, within a loop: I want to subset a data frame by columns. I am interested in the columns names that are given by the rows of another data frame that was built in parallel. The solution I have so far works well as long as the elements of the second data frame are included in the column names of the first data frame but if an element from the second object is not a column name of the first one, then it bugs. Hi Aurelien, I would call this a feature, not a bug. I think R does what it should do, you request a non-existent column and it throws an error. What kind of behavior are you looking for instead of this error? regards, Paul -More concretely, I have the following data frames d and v: mmdd-c(19720601, 19720602, 19720605) sret.10006-c(1,2,3) sret.10014-c(5,9,7) sret.10065-c(10,2,11) d- data.frame(mmdd=mmdd, sret.10006=sret.10006, sret.10014=sret.10014, sret.10065=sret.10065) v- data.frame(V1=sret.10006, V2=sret.10090) v- sapply(v, function(x) levels(x)[x]) -I want to do the following subsetting: sub- subset(d, select=c(v)) and I get the following error message: Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected Any help would be very much appreciated, Best, Aurelien [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] export array
Hi! I would just like to have a way to check if my functions are working ok. If the subset I am extracting is ok (both coordinates and dataset). The files are nectdf format that I import into R (I only import a small geographic subset). Is there another software that will allow me to do this just to check if my code is ok? On Fri, Dec 2, 2011 at 11:00 AM, Jim Holtman jholt...@gmail.com wrote: What do you want to do with it after you export; that will probably define what the data format would look like. Why would you want each dimension separately? How would you correlate them later? Is it really 3 dimensions, or is your data just three columns where each row is long, lat and observation? A small subset of the data would be helpful. Are you going to read it back into R, or send it somewhere else? More information would be useful because you can create almost any output format that you want. Sent from my iPad On Dec 2, 2011, at 4:27, Ana rrast...@gmail.com wrote: What is the best way to export 1 array?? the array i am trying to export has 3 dimensions (long,lat,observations) how can i export each dimension independently? e.g. one csv file with only the long __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] export array
Depends on how you want to 'check'. I usually use 'View' to see if the data looks OK. You could write some more code to check the 'reasonableness' of the data. It sounds like you have to learn some ways of 'debugging' your code. Checking your data depends on what the criteria is for determining correctness. I will also export to Excel to let other people see if it is reasonable, but again it depends on the problem you are trying to solve. Sent from my iPad On Dec 2, 2011, at 5:07, Ana rrast...@gmail.com wrote: Hi! I would just like to have a way to check if my functions are working ok. If the subset I am extracting is ok (both coordinates and dataset). The files are nectdf format that I import into R (I only import a small geographic subset). Is there another software that will allow me to do this just to check if my code is ok? On Fri, Dec 2, 2011 at 11:00 AM, Jim Holtman jholt...@gmail.com wrote: What do you want to do with it after you export; that will probably define what the data format would look like. Why would you want each dimension separately? How would you correlate them later? Is it really 3 dimensions, or is your data just three columns where each row is long, lat and observation? A small subset of the data would be helpful. Are you going to read it back into R, or send it somewhere else? More information would be useful because you can create almost any output format that you want. Sent from my iPad On Dec 2, 2011, at 4:27, Ana rrast...@gmail.com wrote: What is the best way to export 1 array?? the array i am trying to export has 3 dimensions (long,lat,observations) how can i export each dimension independently? e.g. one csv file with only the long __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about plsr() results
Vytautas Rakeviius vytautas1...@yahoo.com writes: But still I have question about results interpretation. In the end I want to construct prediction function in form: Y=a1x1+a2x2 The predict() function does the prediction for you. If you want to construct the prediction _equation_, you can extract the coefficients from the model with coef(yourmodel, ncomp = thenumberofcomponents, intercept = TRUE) See ?coef.mvr for details. Documentation do not describe this. The pls package is designed to work as much as possible like the lm() function and its methods, helpers. So read any introduction to linear models in R, and you will come a long way. There is also a paper in JSS about the pls package: http://www.jstatsoft.org/v18/i02/ -- Cheers, Bjørn-Helge Mevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] find and replace string
Dear all, I would like to search in a string for the second occurrence of a symbol and replace the symbol after it For example my strings look like sta_+1+0_field2ndtry_$01.cfg I want to find the digit that comes after the second +, in that case is zero and then over a loop create the strings below sta_+1+0_field2ndtry_$01.cfg sta_+1+1_field2ndtry_$01.cfg sta_+1+2_field2ndtry_$01.cfg sta_+1+3_field2ndtry_$01.cfg and so on.. I have already tried strsplit but this will make things more complex... Could you please help me with that? B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find and replace string
try this: x - c('sta_+1+0_field2ndtry_$01.cfg' + , 'sta_+1+0_field2ndtry_$01.cfg' + , 'sta_+1-0_field2ndtry_$01.cfg' + , 'sta_+1+0_field2ndtry_$01.cfg' + ) # find matching fields values - grep([^+]*\\+[^+]*\\+0, x, value = TRUE) # split into two pieces splitValues - sub(([^+]*\\+[^+]*\\+)0(.*), \\1^\\2, values) for (i in splitValues){ + for (j in 0:3){ + print(sub(\\^, j, i)) + } + } [1] sta_+1+0_field2ndtry_$01.cfg [1] sta_+1+1_field2ndtry_$01.cfg [1] sta_+1+2_field2ndtry_$01.cfg [1] sta_+1+3_field2ndtry_$01.cfg [1] sta_+1+0_field2ndtry_$01.cfg [1] sta_+1+1_field2ndtry_$01.cfg [1] sta_+1+2_field2ndtry_$01.cfg [1] sta_+1+3_field2ndtry_$01.cfg [1] sta_+1+0_field2ndtry_$01.cfg [1] sta_+1+1_field2ndtry_$01.cfg [1] sta_+1+2_field2ndtry_$01.cfg [1] sta_+1+3_field2ndtry_$01.cfg On Fri, Dec 2, 2011 at 6:30 AM, Alaios ala...@yahoo.com wrote: Dear all, I would like to search in a string for the second occurrence of a symbol and replace the symbol after it For example my strings look like sta_+1+0_field2ndtry_$01.cfg I want to find the digit that comes after the second +, in that case is zero and then over a loop create the strings below sta_+1+0_field2ndtry_$01.cfg sta_+1+1_field2ndtry_$01.cfg sta_+1+2_field2ndtry_$01.cfg sta_+1+3_field2ndtry_$01.cfg and so on.. I have already tried strsplit but this will make things more complex... Could you please help me with that? B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Intersection of 2 matrices
Hi all, I have matrix A of 67420 by 2 and another matrix B of 59199 by 2. I would like to find the number of rows of matrix B that I can find in matrix A (rows that are common to both matrices with or without sorting). I have tried the intersection and is.element functions in R but it only working for the vectors and not matrix i.e, intersection(A,B) and is.element(A,B). Any suggestions please. Oluwole [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rmpi installation problems
Hi all, I am trying to install the Rmpi package in R and, while the installation itself works, it breaks down when trying to load the library. I think it has something to do with shared vs static loading of helper libraries, or the order in which shared libraries are loaded, but I am not sure. R version: 2.14.0 Linux version: Gentoo, i686-pc-linux-gnu (32-bit) GCC version: 4.5.3 (Gentoo 4.5.3-r1 p1.0) OpenMPI version: 1.5.4 Output from R CMD INSTALL . in Rmpi source directory: * installing to library ‘/home/jos/R/i686-pc-linux-gnu-library/2.14’ * installing *source* package ‘Rmpi’ ... checking for gcc... i686-pc-linux-gnu-gcc -std=gnu99 checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether i686-pc-linux-gnu-gcc -std=gnu99 accepts -g... yes checking for i686-pc-linux-gnu-gcc -std=gnu99 option to accept ISO C89... none needed I am here /usr and it is OpenMPI Trying to find mpi.h ... Found in /usr/include Trying to find libmpi.so or libmpich.a ... Found libmpi in /usr/lib checking for openpty in -lutil... yes checking for main in -lpthread... yes configure: creating ./config.status config.status: creating src/Makevars ** Creating default NAMESPACE file ** libs make: Nothing to be done for `all'. installing to /home/jos/R/i686-pc-linux-gnu-library/2.14/Rmpi/libs ** R ** demo ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ... ** testing if installed package can be loaded /usr/lib/R/bin/exec/R: symbol lookup error: /usr/lib/openmpi/mca_paffinity_hwloc.so: undefined symbol: mca_base_param_reg_int ERROR: loading failed * removing ‘/home/jos/R/i686-pc-linux-gnu-library/2.14/Rmpi’ Any help would be greatly appreciated! Jos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] about quantreg() package loading
Hi all, i have installed the quantreg Package using the install packages from local zip fiels option. then i got the following message utils:::menuInstallLocal() package ‘quantreg’ successfully unpacked and MD5 sums checked is that mean quantreg package got installed on my machine?? if so why i am encountered the following error when loading quantreg package using library(quantreg) command Loading required package: SparseM Error: package ‘SparseM’ could not be loaded In addition: Warning message: In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) : there is no package called ‘SparseM’ what is the reason for it.. pls reply as early as possible -- View this message in context: http://r.789695.n4.nabble.com/about-quantreg-package-loading-tp4146366p4146366.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about quantreg() package loading
Hi all, my os is windows 7 and R version is 2.14.and i used the qunatreg zip file(binary version for windows) to install. -- View this message in context: http://r.789695.n4.nabble.com/about-quantreg-package-loading-tp4146366p4146390.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find and replace string
If the length of the fists part is constant (the sta_+1+ part) the you can use substr() On 2 December 2011 13:30, Alaios ala...@yahoo.com wrote: Dear all, I would like to search in a string for the second occurrence of a symbol and replace the symbol after it For example my strings look like sta_+1+0_field2ndtry_$01.cfg I want to find the digit that comes after the second +, in that case is zero and then over a loop create the strings below sta_+1+0_field2ndtry_$01.cfg sta_+1+1_field2ndtry_$01.cfg sta_+1+2_field2ndtry_$01.cfg sta_+1+3_field2ndtry_$01.cfg and so on.. I have already tried strsplit but this will make things more complex... Could you please help me with that? B.R Alex -- Christiaan Pauw Nova Institute www.nova.org.za __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error message in Genetic Matching
Dear R Users, I am a novice learner of R software. I am working with Genetic Matching - GenMatch(), but I am getting an Error message as follows: Error in GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = BalanceMatrix.binarynp, : GenMatch(): input includes NAs Could you please suggest me correcting the above problem? My GenMatch command is, gen1 - GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = BalanceMatrix.binarynp, popsize = 1000) Thanking you, Sincerely Yours, Shyam Kumar Basnet SLU, Uppsala Sweden [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find and replace string
You are too good :) Thanks a lot have a nice weekend B.R Alex From: jim holtman jholt...@gmail.com Cc: R-help@r-project.org R-help@r-project.org Sent: Friday, December 2, 2011 1:51 PM Subject: Re: [R] find and replace string try this: x - c('sta_+1+0_field2ndtry_$01.cfg' + , 'sta_+1+0_field2ndtry_$01.cfg' + , 'sta_+1-0_field2ndtry_$01.cfg' + , 'sta_+1+0_field2ndtry_$01.cfg' + ) # find matching fields values - grep([^+]*\\+[^+]*\\+0, x, value = TRUE) # split into two pieces splitValues - sub(([^+]*\\+[^+]*\\+)0(.*), \\1^\\2, values) for (i in splitValues){ + for (j in 0:3){ + print(sub(\\^, j, i)) + } + } [1] sta_+1+0_field2ndtry_$01.cfg [1] sta_+1+1_field2ndtry_$01.cfg [1] sta_+1+2_field2ndtry_$01.cfg [1] sta_+1+3_field2ndtry_$01.cfg [1] sta_+1+0_field2ndtry_$01.cfg [1] sta_+1+1_field2ndtry_$01.cfg [1] sta_+1+2_field2ndtry_$01.cfg [1] sta_+1+3_field2ndtry_$01.cfg [1] sta_+1+0_field2ndtry_$01.cfg [1] sta_+1+1_field2ndtry_$01.cfg [1] sta_+1+2_field2ndtry_$01.cfg [1] sta_+1+3_field2ndtry_$01.cfg Dear all, I would like to search in a string for the second occurrence of a symbol and replace the symbol after it For example my strings look like sta_+1+0_field2ndtry_$01.cfg I want to find the digit that comes after the second +, in that case is zero and then over a loop create the strings below sta_+1+0_field2ndtry_$01.cfg sta_+1+1_field2ndtry_$01.cfg sta_+1+2_field2ndtry_$01.cfg sta_+1+3_field2ndtry_$01.cfg and so on.. I have already tried strsplit but this will make things more complex... Could you please help me with that? B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about quantreg() package loading
It means you also need to install SparseM on which quantreg depends. This can be done in exactly the same way, either by direct download using install.packages() or local install. Michael On Dec 2, 2011, at 6:30 AM, narendarreddy kalam narendarcse...@gmail.com wrote: Hi all, my os is windows 7 and R version is 2.14.and i used the qunatreg zip file(binary version for windows) to install. -- View this message in context: http://r.789695.n4.nabble.com/about-quantreg-package-loading-tp4146366p4146390.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersection of 2 matrices
On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote: Hi all, I have matrix A of 67420 by 2 and another matrix B of 59199 by 2. I would like to find the number of rows of matrix B that I can find in matrix A (rows that are common to both matrices with or without sorting). I have tried the intersection and is.element functions in R but it only working for the vectors and not matrix i.e,intersection(A,B) and is.element(A,B). Have you considered the 'duplicated' function? -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
On 12/01/2011 08:00 PM, Ben quant wrote: The data I am using is the last file called l_yx.RData at this link (the second file contains the plots from earlier): http://scientia.crescat.net/static/ben/ The logistic regression model you are fitting assumes a linear relationship between x and the log odds of y; that does not seem to be the case for your data. To illustrate: x - l_yx[,x] y - l_yx[,y] ind1 - x = .002 ind2 - (x .002 x = .0065) ind3 - (x .0065 x = .13) ind4 - (x .0065 x = .13) summary(glm(y[ind1]~x[ind1],family=binomial)) ... Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -2.791740.02633 -106.03 2e-16 *** x[ind1] 354.98852 22.78190 15.58 2e-16 *** summary(glm(y[ind2]~x[ind2],family=binomial)) Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -2.158050.02966 -72.766 2e-16 *** x[ind2] -59.929346.51650 -9.197 2e-16 *** summary(glm(y[ind3]~x[ind3],family=binomial)) ... Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -2.367206 0.007781 -304.22 2e-16 *** x[ind3] 18.104314 0.346562 52.24 2e-16 *** summary(glm(y[ind4]~x[ind4],family=binomial)) ... Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -1.315110.08549 -15.383 2e-16 *** x[ind4] 0.062610.08784 0.7130.476 To summarize, the relationship between x and the log odds of y appears to vary dramatically in both magnitude and direction depending on which interval of x's range we're looking at. Trying to summarize this complicated pattern with a single line is leading to the fitted probabilities near 0 and 1 you are observing (note that only 0.1% of the data is in region 4 above, although region 4 accounts for 99.1% of the range of x). -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Resampling with replacement on a binary (0, 1) dataset to get Cis
On Dec 2, 2011, at 3:55 AM, lincoln wrote: Thanks. Anyway, it is not homework and I was not told to do that. My question has not been answered yet, I'll try to reformulate it: Does it make (statistical) sense to resample with replacement in this situation to get an estimate of the CIs? In case it does, how could I do it in R? Some further details on my real case study: 10 independent samples from a population in ten sessions. Each sample consists of a number (somehow variable) of random individuals that are classified as 0 or 1 depending on one specific state (presence or absence of a disease). I can calculate, for each session, the percentage of individuals diseased but I have nothing about the CIs, any suggestion? I do not see much advantage to using resampling in this instance. The variance of a proportion is not theoretically complicated and you have introduced no further complicating factors that would call into question the validity of the estimates you would get from prop.test. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find and replace string
You've been given a workable solution already, but here's a one-liner: x - c('sta_+1+0_field2ndtry_$01.cfg' , 'sta_+B+0_field2ndtry_$01.cfg' , 'sta_+1+0_field2ndtry_$01.cfg' , 'sta_+9+0_field2ndtry_$01.cfg') sapply(1:length(x), function(i)gsub(\\+(.*)\\+., paste(\\+\\1\\+, i, sep=), x[i])) [1] sta_+1+1_field2ndtry_$01.cfg sta_+B+2_field2ndtry_$01.cfg [3] sta_+1+3_field2ndtry_$01.cfg sta_+9+4_field2ndtry_$01.cfg Sarah, fan of regular expressions On Fri, Dec 2, 2011 at 6:30 AM, Alaios ala...@yahoo.com wrote: Dear all, I would like to search in a string for the second occurrence of a symbol and replace the symbol after it For example my strings look like sta_+1+0_field2ndtry_$01.cfg I want to find the digit that comes after the second +, in that case is zero and then over a loop create the strings below sta_+1+0_field2ndtry_$01.cfg sta_+1+1_field2ndtry_$01.cfg sta_+1+2_field2ndtry_$01.cfg sta_+1+3_field2ndtry_$01.cfg and so on.. I have already tried strsplit but this will make things more complex... Could you please help me with that? B.R Alex -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: calculate mean of multiple rows in a data frame
It's easier for folks to help you if you put your example data in a format that can be readily read in R. See, for example, the dput() function, which you can use to provide us with something like this: DF - structure(list(NAME = c(Control_1, Control_2, Control_1, Control_3, MM0289~RFU:11810.15, MM0289~RFU:9238.41, MM16597~RFU:36765.38, MM16597~RFU:41258.94), ID = c(probe~B01R01C01, probe~B01R01C02, probe~B01R09C01, probe~B01R09C02, probe~B29R13C06, probe~B29R13C05, probe~B44R15C20, probe~B44R15C19), a = c(3L, 712L, 937L, 464L, 99L, 605L, 700L, 132L), b = c(22L, 13L, 824L, 836L, 544L, 603L, 923L, 777L), c = c(926L, 32L, 898L, 508L, 607L, 862L, 219L, 497L), d = c(774L, 179L, 668L, 53L, 984L, 575L, 582L, 995L)), .Names = c(NAME, ID, a, b, c, d), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8)) If I understand what you're after, you want to summarize data within groups, but your NAME variable is not as general as you would like. You can get around this by creating a new variable which is a shorter and more general version of the NAME variable. I did this by saving just the part of the NAME before the colon, :. shortname - sapply(strsplit(DF$NAME, :), [, 1) aggregate(DF[, -(1:2)], by=list(shortname=shortname), mean) shortname a b c d 1 Control_1 470 423.0 912.0 721.0 2 Control_2 712 13.0 32.0 179.0 3 Control_3 464 836.0 508.0 53.0 4 MM0289~RFU 352 573.5 734.5 779.5 5 MM16597~RFU 416 850.0 358.0 788.5 Jean Jabez Wilson wrote on 12/01/2011 03:15:39 PM: NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 Sorry, that should look like this: NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 NAME ID a b c d 1 Control_1 probe~B01R01C01 3 22 926 774 2 Control_2 probe~B01R01C02 712 13 32 179 3 Control_1 probe~B01R09C01 937 824 898 668 4 Control_3 probe~B01R09C02 464 836 508 53 5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995 --- On Thu, 1/12/11, Jabez Wilson jabez...@yahoo.co.uk wrote: From: Jabez Wilson jabez...@yahoo.co.uk Subject: calculate mean of multiple rows in a data frame To: R-Help r-h...@stat.math.ethz.ch Date: Thursday, 1 December, 2011, 20:45 Dear all, I have a data frame (DF) in the following format: NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686. I would like to consolidate the data frame by parsing through the rows, and where the NAME is identical, consolidate into one row and return the mean. I can do this for the first lines (Control_1 etc) by using aggregate() aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean) but since aggregate looks for unique lines it won't consolidate e.g. lines 5/6 and 7/8. Is there a way of telling aggregate to grep just the first part of the name (i.e. up to ~) and consolidate those? I could pre-grep the file before importing into R, but I'd like to do it within R if possible. Thanks for any suggestions [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
[R] Unexplained behavior of level names when using ordered factors in lm?
Hello dear all, I am unable to understand why when I run the following three lines: set.seed(4254) a - data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) summary(lm(y ~ x, a)) The output I get includes factor levels which are not relevant to what I am actually using: Call: lm(formula = y ~ x, data = a) Residuals: Min 1Q Median 3Q Max -1.4096 -0.6400 -0.1244 0.5886 2.1891 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.032760.15169 -0.2160.830 x.L -0.289680.33866 -0.8550.398 x.Q -0.388130.33851 -1.1470.259 x.C -0.271830.34027 -0.7990.430 x^4 0.259930.33935 0.7660.449 Residual standard error: 0.9564 on 35 degrees of freedom Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 F-statistic: 0.8202 on 4 and 35 DF, p-value: 0.5211 I am guessing that this is having something to do with the contrast matrix that is used, but this is not clear to me. Can anyone suggest a good read, or an explanation? Thanks. Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] what is used as height in hclust for ward linkage?
Dear R community, I am trying to understand how the ward linkage works from a quantitative point of view. To test it I have devised a simple 3-members set: G = c(0,2,10) The distances between all couples are: d(0,2) = 2 d(0,10) = 10 d(2,10) = 8 The smallest distance corresponds to merging 0 and 2. The corresponding ESS are: ESS(0,2) = 2*var(c(0,2)) = 4 ESS(0,10) = 2*var(c(0,10)) = 100 ESS(2,10) = 2*var(c(2,10)) = 64 and, indeed, the smallest ESS corresponds to merging 0 and 2. The next element that should be added to 0 and 2 is obviously 10. This is where I don't understand how the hclust algorithm in R works. We have G - c(0,2,10) G.dist - dist(G) G.hc - hclust(G.dist,method=ward) G.hc$merge [,1] [,2] [1,] -1 -2 [2,] -31 G.hc$height [1] 2.0 11.3 Now, according to standard definitions, the distance between two clusters with elements Nr and Ns is: d(Rs,Rr) = sqrt(2*Nr*Ns/(Nr+Ns))*||Rs - Rr|| where in the last expression indicates averages (centroids). If I carry out this operation to merge cluster c(0,2) with 10, I get: d(c(0,2),10) = sqrt(2*2*1/(2+1))*|1-9| = 9.237604 This is different from 11. in the R output. Does anyone know what's the exact value for the ward linkage, as displayed in the hclust height output? Thanks in advance for any help! J -- This e-mail and any attachments may contain confidential...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexplained behavior of level names when using ordered factors in lm?
?ordered ?C ?contr.poly If you don't know what polynomial contrasts are, consult any good linear models text. MASS has a good, though a bit terse, section on this. -- Bert On Fri, Dec 2, 2011 at 6:51 AM, Tal Galili tal.gal...@gmail.com wrote: Hello dear all, I am unable to understand why when I run the following three lines: set.seed(4254) a - data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) summary(lm(y ~ x, a)) The output I get includes factor levels which are not relevant to what I am actually using: Call: lm(formula = y ~ x, data = a) Residuals: Min 1Q Median 3Q Max -1.4096 -0.6400 -0.1244 0.5886 2.1891 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.03276 0.15169 -0.216 0.830 x.L -0.28968 0.33866 -0.855 0.398 x.Q -0.38813 0.33851 -1.147 0.259 x.C -0.27183 0.34027 -0.799 0.430 x^4 0.25993 0.33935 0.766 0.449 Residual standard error: 0.9564 on 35 degrees of freedom Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 F-statistic: 0.8202 on 4 and 35 DF, p-value: 0.5211 I am guessing that this is having something to do with the contrast matrix that is used, but this is not clear to me. Can anyone suggest a good read, or an explanation? Thanks. Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexplained behavior of level names when using ordered factors in lm?
Maybe should have explicitly said: C(ordered(1:5)) [1] 1 2 3 4 5 attr(,contrasts) ordered contr.poly Levels: 1 2 3 4 5 -- Bert On Fri, Dec 2, 2011 at 7:06 AM, Bert Gunter bgun...@gene.com wrote: ?ordered ?C ?contr.poly If you don't know what polynomial contrasts are, consult any good linear models text. MASS has a good, though a bit terse, section on this. -- Bert On Fri, Dec 2, 2011 at 6:51 AM, Tal Galili tal.gal...@gmail.com wrote: Hello dear all, I am unable to understand why when I run the following three lines: set.seed(4254) a - data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) summary(lm(y ~ x, a)) The output I get includes factor levels which are not relevant to what I am actually using: Call: lm(formula = y ~ x, data = a) Residuals: Min 1Q Median 3Q Max -1.4096 -0.6400 -0.1244 0.5886 2.1891 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.03276 0.15169 -0.216 0.830 x.L -0.28968 0.33866 -0.855 0.398 x.Q -0.38813 0.33851 -1.147 0.259 x.C -0.27183 0.34027 -0.799 0.430 x^4 0.25993 0.33935 0.766 0.449 Residual standard error: 0.9564 on 35 degrees of freedom Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 F-statistic: 0.8202 on 4 and 35 DF, p-value: 0.5211 I am guessing that this is having something to do with the contrast matrix that is used, but this is not clear to me. Can anyone suggest a good read, or an explanation? Thanks. Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Save Venn-diagram (Vennerable) together with table and plot in single pdf page
Dear R-users I want to save a list with characters a point plot and a Venn diagram in a single pdf page. I am successful to do this when I use a character list and two point plots. However when I try to replace the first point plots with my Venn diagram (built with Vennerable package, compute.Venn() and plot.Venn()) the Venn plot will not position at the right place in the pdf. I guess there are some parameters in the plot.Venn function, that I need to change but I did not find out which ones. Here an example of the pdf with two point plots as I want it: library(gplots) library(Vennerable) Varnames-c(A,B,C) x - Venn(SetNames = Varnames,Weight = c(`100`=2,`010`=6,`001`=10,`110`=1, `011`=0.2, `101`=0.5,`111`=1)) cx-compute.Venn(x,doWeights = TRUE) tabletext-paste(Variable: ,letters[1:8],sep=) pdf(path/plot_test.pdf, fillOddEven=TRUE,paper=a4, onefile=TRUE,width=7,height=10) layout(matrix(c(1,2,2,1,2,2,3,3,3), 3, 3, byrow = TRUE),heights=c(1,1,2)) par(mar=c(6,2,2,4)) textplot(tabletext,valign=top,halign=left,cex=2) plot(rnorm(100),main=Random 1) plot(rnorm(100),col=red,main=Random2) dev.off() And here the example of the pdf with where I try to replace the Random1 point plot with a Venn diagram (wrong size and position of Venn diagram): pdf(path/venn_test.pdf, fillOddEven=TRUE,paper=a4, onefile=TRUE,width=7,height=10) layout(matrix(c(1,2,2,1,2,2,3,3,3), 3, 3, byrow = TRUE),heights=c(1,1,2)) par(mar=c(6,2,2,4)) textplot(tabletext,valign=top,halign=left,cex=2) plot(cx) plot(rnorm(100),col=red,main=Random2) dev.off() Would be thankful for any hints Sonja -- Sonja Braaker Swiss Federal Research Institute WSL Community Ecology Zürcherstrasse 111 CH-8903 Birmensdorf Tel. +41 44 7392 230 Fax +41 44 7392 215 sonja.braa...@wsl.ch http://www.wsl.ch/info/mitarbeitende/braaker/index_EN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R, PostgresSQL and poor performance
On 01/12/2011 17:01, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Thu, Dec 1, 2011 at 10:02 AM, Berry, David I. d...@noc.ac.uk wrote: Hi List Apologies if this isn't the correct place for this query (I've tried a search of the mail archives but not had much joy). I'm running R (2.14.0) on a Mac (OSX v 10.5.8, 2.66GHz, 4GB memory) and am having a few performance issues with reading data in from a Postres database (using RPostgreSQL). My query / code are as below # - library('RPostgreSQL') drv - dbDriver(PostgreSQL) dbh - dbConnect(drv,user=Š,password=Š,dbname=Š,host=Š) sql - select id, date, lon, lat, date_trunc('day' , date) as jday, extract('hour' from date) as hour, extract('year' from date) as year from observations where pt = 6 and date = '1990-01-01' and date '1995-01-01' and lon 180 and lon 290 and lat -30 and lat 30 and sst is not null dataIn - dbGetQuery(dbh,sql) If this is a large table of which the desired rows are a small fraction of all rows then be sure there indexes on the variables in your where clause. You can also try it with the RpgSQL driver although there is no reason to think that that would be faster. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com Thanks for the reply and suggestions. I've tried the RpgSQL drivers and the results are pretty similar in terms of performance. The ~1.5M records I'm trying to read into R are being extracted from a table with ~300M rows (and ~60 columns) that has been indexed on the relevant columns and horizontally partitioned (with constraint checking on). I do need to try and optimize the database a bit more but I don¹t think this is the cause of the performance issues. As an example, when I run the query purely in R it takes 273s to run (using system.time() to time it). When I extract the data via psql and system() and then import it into R using read.table() it takes 32s. The code I've used for both are below. The second way of doing it (psql and read.table()) is less than ideal but does seem to have a big performance advantage at the moment the only difference in the results is that the date variables are stored as strings in the second example. # Query purely in R # dbh - dbConnect(drv,user=Š,password=Š, dbname=Š,host=Š) sql - select id, date, lon, lat, date_trunc('day' , date) as jday, extract('hour' from date) as hour, extract('year' from date) as year from observations where pt = 6 and date = '1990-01-01' and date '1995-01-01' and lon 180 and lon 290 and lat -30 and lat 30 and sst is not null; dataIn - dbGetQuery(dbh,sql) # Query via command line # -- system('psql h myhost d mydb U myuid f getData.sql') system('cat tmp.csv | sed 's/^,//g;s/^[0-9a-zA-Z]\+//g' tmp2.csv') # This just ensures the first column is quoted dataIn - read.table('tmp2.csv',sep=',' ,col.names=c( id,date,lon,lat,jday,hour,year) ) # Contents of getData.sql # - \o ./tmp.csv \pset format unaligned \pset fieldsep ',' \pset tuples_only select id, date, lon, lat, date_trunc('day' , date) as jday, extract('hour' from date) as hour, extract('year' from date) as year from observations where pt = 6 and date = '1990-01-01' and date '1995-01-01' and lon 180 and lon 290 and lat -30 and lat 30 and sst is not null; \q -- David Berry National Oceanography Centre, UK -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in Genetic Matching
Dear R Users, I am a novice learner of R software. I am working with Genetic Matching - GenMatch(), but I am getting an Error message as follows: Error in GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = BalanceMatrix.binarynp, : GenMatch(): input includes NAs Could you please suggest me correcting the above problem? My GenMatch command is, gen1 - GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = BalanceMatrix.binarynp, popsize = 1000) Thanking you, Sincerely Yours, Shyam Kumar Basnet SLU, Uppsala Sweden [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to test for Poisson?
Hi! I am sitting with a school assignment, but I got stuck on this one. I am suppose to test if my data is Poisson-distributed. The data I´m using is the studie Bids, found in the Ecdat-package, and the variable of interest is the dependent numbids. How do I practically perform a test for this? Kind regards/ Richard -- View this message in context: http://r.789695.n4.nabble.com/How-to-test-for-Poisson-tp4147356p4147356.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] breaking up n object into random groups
say n = 100 I want to partition this into 4 random groups wheren n1 + n2 + n3 + n4 = n and ni is the number of elements in group i. Thank you for you help -- View this message in context: http://r.789695.n4.nabble.com/breaking-up-n-object-into-random-groups-tp4147476p4147476.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexplained behavior of level names when using ordered factors in lm?
On Dec 2, 2011, at 9:51 AM, Tal Galili wrote: Hello dear all, I am unable to understand why when I run the following three lines: set.seed(4254) a - data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) summary(lm(y ~ x, a)) The output I get includes factor levels which are not relevant to what I am actually using: Call: lm(formula = y ~ x, data = a) Residuals: Min 1Q Median 3Q Max -1.4096 -0.6400 -0.1244 0.5886 2.1891 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.032760.15169 -0.2160.830 x.L -0.289680.33866 -0.8550.398 x.Q -0.388130.33851 -1.1470.259 x.C -0.271830.34027 -0.7990.430 x^4 0.259930.33935 0.7660.449 Those are polynomial contrasts: linear, quadratic, cubic and quartic. If you don't want contrasts based on ordered factors then just use regular factors. You should probably be looking at: ?C (...yet another function whose name should be avoided in naming data- objects.) -- David. Residual standard error: 0.9564 on 35 degrees of freedom Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 F-statistic: 0.8202 on 4 and 35 DF, p-value: 0.5211 I am guessing that this is having something to do with the contrast matrix that is used, but this is not clear to me. Can anyone suggest a good read, or an explanation? Thanks. Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to test for Poisson?
A simple way to determine if it is NOT is to see if the mean (the single parameter of a poisson: lambda) and variance are the same. This really has nothing to do with R (other than the data source), and since it is homework, you will likely get no further help here. Good luck. RToss wrote Hi! I am sitting with a school assignment, but I got stuck on this one. I am suppose to test if my data is Poisson-distributed. The data I´m using is the studie Bids, found in the Ecdat-package, and the variable of interest is the dependent numbids. How do I practically perform a test for this? Kind regards/ Richard -- View this message in context: http://r.789695.n4.nabble.com/How-to-test-for-Poisson-tp4147356p4147519.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] breaking up n object into random groups
On Dec 2, 2011, at 10:09 AM, statfan wrote: say n = 100 I want to partition this into 4 random groups wheren n1 + n2 + n3 + n4 = n and ni is the number of elements in group i. Try assigning with a sample() from: unlist(mapply(rep, c(1:4), each=c(n1,n2,n3,n4))) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] breaking up n object into random groups
There are a million ways to do this, probably. brks - c(1,sort(sample(seq_len(99),3)),100) ## 4 random groups and then use brks as the breaks parameter in cut() with include.lowest = TRUE ?cut -- Bert On Fri, Dec 2, 2011 at 7:09 AM, statfan irene_vr...@hotmail.com wrote: say n = 100 I want to partition this into 4 random groups wheren n1 + n2 + n3 + n4 = n and ni is the number of elements in group i. Thank you for you help -- View this message in context: http://r.789695.n4.nabble.com/breaking-up-n-object-into-random-groups-tp4147476p4147476.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CART with rpart
dear all, i want to keep in my data file the results of terminal nodes (groups) after CART analysis for performing other statisticals analysis by this groups. can you help me please? thanks. jan. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexplained behavior of level names when using ordered factors in lm?
Thank you both Bert and David, for the quick reply. I will look further into this. With regards, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, Dec 2, 2011 at 5:08 PM, Bert Gunter gunter.ber...@gene.com wrote: Maybe should have explicitly said: C(ordered(1:5)) [1] 1 2 3 4 5 attr(,contrasts) ordered contr.poly Levels: 1 2 3 4 5 -- Bert On Fri, Dec 2, 2011 at 7:06 AM, Bert Gunter bgun...@gene.com wrote: ?ordered ?C ?contr.poly If you don't know what polynomial contrasts are, consult any good linear models text. MASS has a good, though a bit terse, section on this. -- Bert On Fri, Dec 2, 2011 at 6:51 AM, Tal Galili tal.gal...@gmail.com wrote: Hello dear all, I am unable to understand why when I run the following three lines: set.seed(4254) a - data.frame(y = rnorm(40), x=ordered(sample(1:5, 40, T))) summary(lm(y ~ x, a)) The output I get includes factor levels which are not relevant to what I am actually using: Call: lm(formula = y ~ x, data = a) Residuals: Min 1Q Median 3Q Max -1.4096 -0.6400 -0.1244 0.5886 2.1891 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.032760.15169 -0.2160.830 x.L -0.289680.33866 -0.8550.398 x.Q -0.388130.33851 -1.1470.259 x.C -0.271830.34027 -0.7990.430 x^4 0.259930.33935 0.7660.449 Residual standard error: 0.9564 on 35 degrees of freedom Multiple R-squared: 0.08571, Adjusted R-squared: -0.01878 F-statistic: 0.8202 on 4 and 35 DF, p-value: 0.5211 I am guessing that this is having something to do with the contrast matrix that is used, but this is not clear to me. Can anyone suggest a good read, or an explanation? Thanks. Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CART with rpart
Hi Jan, You are likely to simply use ?predict (e.g: predict.rpart) Are you using a classification or a regression tree? Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, Dec 2, 2011 at 6:15 PM, kende jan kende...@yahoo.fr wrote: dear all, i want to keep in my data file the results of terminal nodes (groups) after CART analysis for performing other statisticals analysis by this groups. can you help me please? thanks. jan. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Willkommen bei der R-help Mailingliste
Hello everbody, I am new to this mailing list and hope to find some help. I'm trying to get into the spatstat package and encountered two problems. First a graphical one: There is an example dataset called finpines which has several marks (http://www.oga-lab.net/RGM2/func.php?rd_id=spatstat:finpines) When I pass the given code from the website to R data(finpines) plot(unmark(finpines), main=Finnish pines: locations) plot(finpines, which.marks=height, main=heights) plot(finpines, which.marks=diameter, main=diameters) I get the warning Warnmeldung: In symbols(c(-1.993875, -1.019901, -4.914071, -4.469962, -4.303847, : which.marks ist kein Grafikparameter Something like which.marks is not a graphic parameter; and the plots for height and diameter show now differences. Furthermore, I want to create a ppp with several marks, but I did not figure out how this works. Trying X - as.ppp(mydata, owin(c(174, 178), c(29, 33))) just gives the error Error in as.ppp(mydata, owin(c(174, 178), c(29, 33))) : X must be a two-column or three-column data frame The data set looks something like Date X YMar1Mar2Mar3 1.1.4 3 50 6 A 2.1.2 1 40 9 A 3.1.5 8 35 12B But how can I integrate two or more marks in a three-column data frame, when two columns are already needed for the X and Y coordinates? I hope you can help me with this. Cheers sina [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R + memory of objects
Dear R community, I am still struggling a bit on how R does memory allocation and how to optimize my code to minimize working memory load. Simon (thanks!) and others gave me a hint to use the command gc() to clean up memory which works quite nice but appears to me to be more like a fix to a problem. To give you an impression of what I am talking, here is a short code example + I will give rough measure (system track app) of my working memory needed for each computational step (R64bit latest version on WIN 7 64 bit system, 2 Cores, approx 4 GB Ram): ## # example 1: y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2) # used working memory increases from 1044 -- 1808 MB # (same command again, i.e.) y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2) # 1808 MB -- 2178 MB Why does memory increase? # (give the matrix column names) colnames(y) = c(col1, col2) # 2178 MB -- 1781 MB Why does the size of an object decrease if I assign column labels? ### # example 2: y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2) 1016 -- 1780 MB y = data.frame(y) # increase from 1780 MB -- 3315 MB ## Why does it take so much extra memory to store this matrix as a data.frame? It is not the object per se (i.e. that data.frames need more memory) because if I use gc() memory size drops to 1387 MB. Does this mean that it may be more memory-efficient not to use any data.frames but matrices only? etc. This puzzles me a lot. From my experience these effects are also accentuated for larger objects. As an anecdotal comparison: I also used Stata in my last project due to these memory problems and I could do a lot of variable manipulations of the same (!) data with significant (I am talking about GB) less memory needed. Best, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Graphics - Axis Labels overlap window edges
Hi, I am trying to put larger axis labels on my graphs (using cex.axis and cex.label) but when I do this the top of the text on the Y axis goes outside of the window which you can see in this picture -http://twitter.com/#!/robgriffin247/status/142642881436450816/photo/1 - (if you click on the picture it opens a larger version so it is easier to see the problem) is there anyway I can get R to not cut the top off the letters? Thanks, Rob -- View this message in context: http://r.789695.n4.nabble.com/Graphics-Axis-Labels-overlap-window-edges-tp4147897p4147897.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem subsetting: undefined columns
Hi Paul and Jim, Thanks for your messages. I just wanted R to give me the columns of my data frame d, whose names appear in v. I do not care about the names of v that are not in d. In addition, every time, there will be at least one element of v that has a corresponding column in d, for sure, so I know there is at least one match between the 2. Initially, I tried something in the spirit: sub- subset(d, colnames(d) %in% v) but I could not make it work properly. Best, Aurelien 2011/12/2 Paul Hiemstra paul.hiems...@knmi.nl On 12/02/2011 07:20 AM, Aurélien PHILIPPOT wrote: Dear R-users, -I am new to R, and I am struggling with the following problem. -I am repeating the following operations hundreds of times, within a loop: I want to subset a data frame by columns. I am interested in the columns names that are given by the rows of another data frame that was built in parallel. The solution I have so far works well as long as the elements of the second data frame are included in the column names of the first data frame but if an element from the second object is not a column name of the first one, then it bugs. Hi Aurelien, I would call this a feature, not a bug. I think R does what it should do, you request a non-existent column and it throws an error. What kind of behavior are you looking for instead of this error? regards, Paul -More concretely, I have the following data frames d and v: mmdd-c(19720601, 19720602, 19720605) sret.10006-c(1,2,3) sret.10014-c(5,9,7) sret.10065-c(10,2,11) d- data.frame(mmdd=mmdd, sret.10006=sret.10006, sret.10014=sret.10014, sret.10065=sret.10065) v- data.frame(V1=sret.10006, V2=sret.10090) v- sapply(v, function(x) levels(x)[x]) -I want to do the following subsetting: sub- subset(d, select=c(v)) and I get the following error message: Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected Any help would be very much appreciated, Best, Aurelien [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R + memory of objects
I guess the numbers your report are what your OS shows you? R runs garbage collection (which can be manually triggred by gc()) after certain fuzzy rules. So what you report below is not always the current required memory but what was allocated and not yet garbnage collected. See ?object.size to get the memory consumption of objects. Uwe Ligges On 02.12.2011 16:17, Marc Jekel wrote: Dear R community, I am still struggling a bit on how R does memory allocation and how to optimize my code to minimize working memory load. Simon (thanks!) and others gave me a hint to use the command gc() to clean up memory which works quite nice but appears to me to be more like a fix to a problem. To give you an impression of what I am talking, here is a short code example + I will give rough measure (system track app) of my working memory needed for each computational step (R64bit latest version on WIN 7 64 bit system, 2 Cores, approx 4 GB Ram): ## # example 1: y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2) # used working memory increases from 1044 -- 1808 MB # (same command again, i.e.) y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2) # 1808 MB -- 2178 MB Why does memory increase? # (give the matrix column names) colnames(y) = c(col1, col2) # 2178 MB -- 1781 MB Why does the size of an object decrease if I assign column labels? ### # example 2: y= matrix(rep(1,5000), nrow = 5000/2 , ncol = 2) 1016 -- 1780 MB y = data.frame(y) # increase from 1780 MB -- 3315 MB ## Why does it take so much extra memory to store this matrix as a data.frame? It is not the object per se (i.e. that data.frames need more memory) because if I use gc() memory size drops to 1387 MB. Does this mean that it may be more memory-efficient not to use any data.frames but matrices only? etc. This puzzles me a lot. From my experience these effects are also accentuated for larger objects. As an anecdotal comparison: I also used Stata in my last project due to these memory problems and I could do a lot of variable manipulations of the same (!) data with significant (I am talking about GB) less memory needed. Best, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Graphics - Axis Labels overlap window edges
On 02.12.2011 17:41, robgriffin247 wrote: Hi, I am trying to put larger axis labels on my graphs (using cex.axis and cex.label) but when I do this the top of the text on the Y axis goes outside of the window which you can see in this picture -http://twitter.com/#!/robgriffin247/status/142642881436450816/photo/1 - (if you click on the picture it opens a larger version so it is easier to see the problem) is there anyway I can get R to not cut the top off the letters? Increase the margins. See ?par and its mar argument. Uwe Ligges Thanks, Rob -- View this message in context: http://r.789695.n4.nabble.com/Graphics-Axis-Labels-overlap-window-edges-tp4147897p4147897.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Forests in R
Thanks for this! Axel. On Thu, Dec 1, 2011 at 11:29 AM, Liaw, Andy andy_l...@merck.com wrote: The first version of the package was created by re-writing the main program in the original Fortran as C, and calls other Fortran subroutines that were mostly untouched, so dynamic memory allocation can be done. Later versions have most of the Fortran code translated/re-written in C. Currently the only Fortran part is the node splitting in classification trees. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Langfelder Sent: Thursday, December 01, 2011 12:33 AM To: Axel Urbiz Cc: R-help@r-project.org Subject: Re: [R] Random Forests in R On Wed, Nov 30, 2011 at 7:48 PM, Axel Urbiz axel.ur...@gmail.com wrote: I understand the original implementation of Random Forest was done in Fortran code. In the source files of the R implementation there is a note C wrapper for random forests: get input from R and drive the Fortran routines.. I'm far from an expert on this...does that mean that the implementation in R is through calls to C functions only (not Fortran)? So, would knowing C be enough to understand this code, or Fortran is also necessary? I haven't seen the C and Fortran code for Random Forest but I understand the note to say that R code calls some C functions that pre-process (possibly re-format etc) the data, then call the actual Random Forest method that's written in Fortran, then possibly post-process the output and return it to R. It would imply that to understand the actual Random Forest code, you will have to read the Fortran source code. Best, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attach...{{dropped:16}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexplained behavior of level names when using ordered factors in lm?
Hi Bert, Since you opened the door ... On Fri, Dec 2, 2011 at 10:06 AM, Bert Gunter gunter.ber...@gene.com wrote: ?ordered ?C ?contr.poly If you don't know what polynomial contrasts are, consult any good linear models text. MASS has a good, though a bit terse, section on this. Do you have a favorite liner model text with a bit more exposition than MASS? Even though this list isn't for teaching stats, whenever I can catch some of the tried and true statisticians talking about texts on specific subject matter, I like to take advantage of it to see what I need to add to my amazon wish list to help sharpen the old saw :-) Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem subsetting: undefined columns
How about this? d[, v[v %in% colnames(d)]] Michael On Dec 2, 2011, at 12:01 PM, Aurélien PHILIPPOT aurelien.philip...@gmail.com wrote: Hi Paul and Jim, Thanks for your messages. I just wanted R to give me the columns of my data frame d, whose names appear in v. I do not care about the names of v that are not in d. In addition, every time, there will be at least one element of v that has a corresponding column in d, for sure, so I know there is at least one match between the 2. Initially, I tried something in the spirit: sub- subset(d, colnames(d) %in% v) but I could not make it work properly. Best, Aurelien 2011/12/2 Paul Hiemstra paul.hiems...@knmi.nl On 12/02/2011 07:20 AM, Aur�lien PHILIPPOT wrote: Dear R-users, -I am new to R, and I am struggling with the following problem. -I am repeating the following operations hundreds of times, within a loop: I want to subset a data frame by columns. I am interested in the columns names that are given by the rows of another data frame that was built in parallel. The solution I have so far works well as long as the elements of the second data frame are included in the column names of the first data frame but if an element from the second object is not a column name of the first one, then it bugs. Hi Aurelien, I would call this a feature, not a bug. I think R does what it should do, you request a non-existent column and it throws an error. What kind of behavior are you looking for instead of this error? regards, Paul -More concretely, I have the following data frames d and v: mmdd-c(19720601, 19720602, 19720605) sret.10006-c(1,2,3) sret.10014-c(5,9,7) sret.10065-c(10,2,11) d- data.frame(mmdd=mmdd, sret.10006=sret.10006, sret.10014=sret.10014, sret.10065=sret.10065) v- data.frame(V1=sret.10006, V2=sret.10090) v- sapply(v, function(x) levels(x)[x]) -I want to do the following subsetting: sub- subset(d, select=c(v)) and I get the following error message: Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected Any help would be very much appreciated, Best, Aurelien [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Partitioning Around Mediods then rpart to follow - is this sensible
The problem: There are no a priori groupings to run a classification on My solution: This is a non-R code question, so I appreciate any thoughts. I have used pam in the cluster package proceeded by sillohouette to find the optimum number of clusters on scaled and centered data. I have followed this by a classification tree analysis with rpart to discern which variables drive the clustering on the original data. Is this a sensible approach? many thanks, Stephen Sefick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with loop
Hi, I try to build a loop difficultly. I have in a folder called Matrices several files (.csv) called Mat2002273, Mat2002274 to Mat2002361. I want to calculate for each file the mean of the column called Pixelvalues. I try this code but as result, I have this message: Mat2002273 not found essai-read.table(C:\\Users\\Desktop\\Matrices\\Mat2002273.csv,sep=;,dec=,,header=TRUE) essai a - NULL for(i in Mat2002273:Mat2002361){ paste(mean(essai$Pixelvalues)) a[i] - paste(mean(essai$Pixelvalues)) print(a[i]) } Thank you for your help -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-loop-tp4148083p4148083.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersection of 2 matrices
On 2/12/2011 2:48 p.m., David Winsemius wrote: On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote: Hi all, I have matrix A of 67420 by 2 and another matrix B of 59199 by 2. I would like to find the number of rows of matrix B that I can find in matrix A (rows that are common to both matrices with or without sorting). I have tried the intersection and is.element functions in R but it only working for the vectors and not matrix i.e,intersection(A,B) and is.element(A,B). Have you considered the 'duplicated' function? Here is an example based on the duplicated function test.mat1 - matrix(1:20, nc = 5) test.mat2 - rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5)) compMat - function(mat1, mat2){ nr1 - nrow(mat1) nr2 - nrow(mat2) mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ] } compMat(test.mat1, test.mat2) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with loop
You never create a variable called Mat2002273 or Mat2002361 so you can't ask R to loop over all the values between them. If I were you, I'd code something like this: lf - list.files() # PUT IN SOME CODE TO REMOVE FILES YOU DON'T WANT TO USE pv - vector(numeric, length(lf)) for(i in lf) pv[i] - mean( read.csv(lf, header = TRUE)[,Pixelvalues]) print(pv) Michael On Fri, Dec 2, 2011 at 12:15 PM, Komine moma...@yahoo.fr wrote: Hi, I try to build a loop difficultly. I have in a folder called Matrices several files (.csv) called Mat2002273, Mat2002274 to Mat2002361. I want to calculate for each file the mean of the column called Pixelvalues. I try this code but as result, I have this message: Mat2002273 not found essai-read.table(C:\\Users\\Desktop\\Matrices\\Mat2002273.csv,sep=;,dec=,,header=TRUE) essai a - NULL for(i in Mat2002273:Mat2002361){ paste(mean(essai$Pixelvalues)) a[i] - paste(mean(essai$Pixelvalues)) print(a[i]) } Thank you for your help -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-loop-tp4148083p4148083.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] References for book R In Action by Kabacoff
The references are here: http://manning.com/kabacoff/excerpt_references.pdf (they will be included on the next printing too, got omitted by mistake) Regards, Pablo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help in dbWriteTable
Hi, The following code should work: fields - dbListFields(con, db.table.name) reordered.names - names(df)[match(fields, names(df))] df - df[ ,reordered.names] But, you might want to try using the function 'dbWriteTable2' in the 'caroline' package. (In fact the three lines above have been copied verbatim out of said function). It works much like the original dbWriteTable but also addresses the column reordering frustration you mention and more: na's in NOT NULL columns, length mismatches, adding NA columns for missing fields, type checking as well as primary key support for PostgreSQL. I use it mainly with Postgres so I can't say for sure if it'll work for you. But let me know if it doesn't! -Dave Schruth On 12/1/2011 8:53 PM, arunkumar wrote: hi I need some help in dbWriteTable. I'm not able to insert the rows in the table if the column order are not same in the database and in the dataframe which i'm inserting. Also facing issue if the table is already created externally and inserting it thru dbWrite. is there some way that we can sepecify the rownames in the dbwrite..or any method which will solve my problem -- View this message in context: http://r.789695.n4.nabble.com/help-in-dbWriteTable-tp4145110p4145110.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compiling R using Solaris Studio
I am trying to compile R using Solaris Studio, but it keeps trying to use the GNU compiler! I've tried editing all the Makeconf files I can find, but configure keeps changing them back! I tried to rename the GNU directory so it could not find gcc, but then I got a missing lib error. How does one change the compiler used to compile R? Thanks! Roger -- View this message in context: http://r.789695.n4.nabble.com/Compiling-R-using-Solaris-Studio-tp4148407p4148407.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in Genetic Matching
The error message is pretty explicit: your problem is taht one of your inputs has NA (missing value) in it and the GenMatch() function is not prepared to handle the. You can find which one by running: any(is.na(Tr)) any(is.na(X.binarynp) any(is.na(BalanceMatrix.binarynp)) and then use View() on the object with NAs to take a look and see where they are coming from. Michael On Fri, Dec 2, 2011 at 9:16 AM, shyam basnet shyamabc2...@yahoo.com wrote: Dear R Users, I am a novice learner of R software. I am working with Genetic Matching - GenMatch(), but I am getting an Error message as follows: Error in GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = BalanceMatrix.binarynp, : GenMatch(): input includes NAs Could you please suggest me correcting the above problem? My GenMatch command is, gen1 - GenMatch(Tr = Tr, X = X.binarynp, BalanceMatrix = BalanceMatrix.binarynp, popsize = 1000) Thanking you, Sincerely Yours, Shyam Kumar Basnet SLU, Uppsala Sweden [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Project local libraries (reproducible research)
Hi all, I was wondering if any one had scripts that they could share for capturing the current version of R packages used for a project. I'm interested in creating a project local library so that you're safe if someone (e.g. the ggplot2 author) updates a package you're relying on and breaks your code. I could fairly easily hack together, but I was wondering if any one had any neat scripts they'd care to share. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersection of 2 matrices
Michael Kao mkao006rmail at gmail.com writes: Your solution is fast, but not completely correct, because you are also counting possible duplicates within the second matrix. The 'refitted' function could look as follows: compMat2 - function(A, B) { # rows of B present in A B0 - B[!duplicated(B), ] na - nrow(A); nb - nrow(B0) AB - rbind(A, B0) ab - duplicated(AB)[(na+1):(na+nb)] return(sum(ab)) } and testing an example the size the OR was asking for: set.seed(8237) A - matrix(sample(1:1000, 2*67420, replace=TRUE), 67420, 2) B - matrix(sample(1:1000, 2*59199, replace=TRUE), 59199, 2) system.time(n - compMat2(A, B)) # n = 3790 while compMat() will return 5522 rows, with 1732 duplicates within B ! A 3.06 GHz iMac needs about 2 -- 2.5 seconds. Hans Werner On 2/12/2011 2:48 p.m., David Winsemius wrote: On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote: Hi all, I have matrix A of 67420 by 2 and another matrix B of 59199 by 2. I would like to find the number of rows of matrix B that I can find in matrix A (rows that are common to both matrices with or without sorting). I have tried the intersection and is.element functions in R but it only working for the vectors and not matrix i.e,intersection(A,B) and is.element(A,B). Have you considered the 'duplicated' function? Here is an example based on the duplicated function test.mat1 - matrix(1:20, nc = 5) test.mat2 - rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5)) compMat - function(mat1, mat2){ nr1 - nrow(mat1) nr2 - nrow(mat2) mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ] } compMat(test.mat1, test.mat2) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RFE: vectorized behavior for as.POSIXct tz argument
x - 1472562988 + 1:10; tz - rep(EST,10) # Case 1: Works as documented ct - as.POSIXct(x, tz=tz[1], origin=1960-01-01) # Case 2: Fails ct - as.POSIXct(x, tz=tz, origin=1960-01-01) If case 2 worked, it'd be a little easier to process paired (time, time zone) vectors from different time zones. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersection of 2 matrices
Michael Kao mkao006rmail at gmail.com writes: Well, taking a second look, I'd say it depends on the exact formulation. In the applications I have in mind, I would like to count each occurrence in B only once. Perhaps the OP never thought about duplicates in B Hans Werner Here is an example based on the duplicated function test.mat1 - matrix(1:20, nc = 5) test.mat2 - rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5)) compMat - function(mat1, mat2){ nr1 - nrow(mat1) nr2 - nrow(mat2) mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ] } compMat(test.mat1, test.mat2) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem subsetting: undefined columns
thanks Michael. I played with your suggestion to get the output in the format I wanted, and I found the following that works fine: sub-d[, which(colnames(d) %in% v) ] Aurelien 2011/12/2 R. Michael Weylandt michael.weyla...@gmail.com michael.weyla...@gmail.com How about this? d[, v[v %in% colnames(d)]] Michael On Dec 2, 2011, at 12:01 PM, Aurélien PHILIPPOT aurelien.philip...@gmail.com wrote: Hi Paul and Jim, Thanks for your messages. I just wanted R to give me the columns of my data frame d, whose names appear in v. I do not care about the names of v that are not in d. In addition, every time, there will be at least one element of v that has a corresponding column in d, for sure, so I know there is at least one match between the 2. Initially, I tried something in the spirit: sub- subset(d, colnames(d) %in% v) but I could not make it work properly. Best, Aurelien 2011/12/2 Paul Hiemstra paul.hiems...@knmi.nl On 12/02/2011 07:20 AM, Aur�lien PHILIPPOT wrote: Dear R-users, -I am new to R, and I am struggling with the following problem. -I am repeating the following operations hundreds of times, within a loop: I want to subset a data frame by columns. I am interested in the columns names that are given by the rows of another data frame that was built in parallel. The solution I have so far works well as long as the elements of the second data frame are included in the column names of the first data frame but if an element from the second object is not a column name of the first one, then it bugs. Hi Aurelien, I would call this a feature, not a bug. I think R does what it should do, you request a non-existent column and it throws an error. What kind of behavior are you looking for instead of this error? regards, Paul -More concretely, I have the following data frames d and v: mmdd-c(19720601, 19720602, 19720605) sret.10006-c(1,2,3) sret.10014-c(5,9,7) sret.10065-c(10,2,11) d- data.frame(mmdd=mmdd, sret.10006=sret.10006, sret.10014=sret.10014, sret.10065=sret.10065) v- data.frame(V1=sret.10006, V2=sret.10090) v- sapply(v, function(x) levels(x)[x]) -I want to do the following subsetting: sub- subset(d, select=c(v)) and I get the following error message: Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected Any help would be very much appreciated, Best, Aurelien [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summarizing elements of a list
Thank you for the help, I knew it could be done with a member of the apply family. I struggle with apply stuff though, its not always intuitive for me with these functions. Cheers, JR From: Sarah Goslee [via R] [mailto:ml-node+s789695n414453...@n4.nabble.com] Sent: Thursday, December 01, 2011 6:44 PM To: ROLL Josh F Subject: Re: Summarizing elements of a list How about: lapply(Version1_, subset, subset=c(TRUE, FALSE)) or sapply() depending on what you want the result to look like. Thanks for the reproducible example. Sarah On Thu, Dec 1, 2011 at 5:17 PM, LCOG1 [hidden email]/user/SendEmail.jtp?type=nodenode=4144538i=0 wrote: Hi everyone, I looked around the list for a while but couldn't find a solution to my problem. I am storing some results to a simulation in a list and for each element i have two separate vectors(is that what they are called, correct my vocab if necessary). See below Version1_-list() for(i in 1:5){ Version1_[[i]]-list(First=rnorm(1),Second=rnorm(1)) } What I want is to put all of the elements' 'First' vectors into a single list to box plot. But whats a more elegant solution to the below? c(Version1_[[1]]$First,Version1_[[2]]$First,Version1_[[3]]$First,Version1_[[4]]$First,Version1_[[5]]$First) since i have 50 or more simulations this is impractical and sloppy. Do I need to store my data differently or is their a solution on the back end? Thanks all. Josh -- Sarah Goslee http://www.functionaldiversity.org __ [hidden email]/user/SendEmail.jtp?type=nodenode=4144538i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4144538.html To unsubscribe from Summarizing elements of a list, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4142479code=anJvbGxAbGNvZy5vcmd8NDE0MjQ3OXwtMTcwMzUwNjI0Mg==. NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4148568.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summarizing elements of a list
Great, this worked the fastest of all the suggestions. Cheers, Josh From: Michael Weylandt [via R] [mailto:ml-node+s789695n414494...@n4.nabble.com] Sent: Thursday, December 01, 2011 8:11 PM To: ROLL Josh F Subject: Re: Summarizing elements of a list Similarly, this might work: unlist(lapply(Version1_, `[`,First)) Michael On Thu, Dec 1, 2011 at 9:41 PM, Sarah Goslee [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=0 wrote: How about: lapply(Version1_, subset, subset=c(TRUE, FALSE)) or sapply() depending on what you want the result to look like. Thanks for the reproducible example. Sarah On Thu, Dec 1, 2011 at 5:17 PM, LCOG1 [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=1 wrote: Hi everyone, I looked around the list for a while but couldn't find a solution to my problem. I am storing some results to a simulation in a list and for each element i have two separate vectors(is that what they are called, correct my vocab if necessary). See below Version1_-list() for(i in 1:5){ Version1_[[i]]-list(First=rnorm(1),Second=rnorm(1)) } What I want is to put all of the elements' 'First' vectors into a single list to box plot. But whats a more elegant solution to the below? c(Version1_[[1]]$First,Version1_[[2]]$First,Version1_[[3]]$First,Version1_[[4]]$First,Version1_[[5]]$First) since i have 50 or more simulations this is impractical and sloppy. Do I need to store my data differently or is their a solution on the back end? Thanks all. Josh -- Sarah Goslee http://www.functionaldiversity.org __ [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=2 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=3 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4144941.html To unsubscribe from Summarizing elements of a list, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4142479code=anJvbGxAbGNvZy5vcmd8NDE0MjQ3OXwtMTcwMzUwNjI0Mg==. NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4148571.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] order function give back row name
Hello, I have a matrix results with dimension 1x9 double matrix XLB XLE XLF XLI 1 53.3089 55.77923 37.64458 83.08646 I'm trying to order this matrix print(order(results)) [1] 3 1 2 4 how can the function order return the columnname XLF XLB XLE XLI instead of 3 1 2 4 any idea ? Thank you in advance -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersection of 2 matrices
Here is one way of doing it: compMat2 - function(A, B) { # rows of B present in A +B0 - B[!duplicated(B), ] +na - nrow(A); nb - nrow(B0) +AB - rbind(A, B0) +ab - duplicated(AB)[(na+1):(na+nb)] +return(sum(ab)) +} set.seed(8237) A - matrix(sample(1:1000, 2*67420, replace=TRUE), 67420, 2) B - matrix(sample(1:1000, 2*59199, replace=TRUE), 59199, 2) system.time({ + # convert for comparison + A.1 - apply(A, 1, function(x) paste(x, collapse = ' ')) + B.1 - apply(B, 1, function(x) paste(x, collapse = ' ')) + count - sum(B.1 %in% A.1) +}) user system elapsed 1.770.001.79 count [1] 3905 On Fri, Dec 2, 2011 at 2:46 PM, Hans W Borchers hwborch...@googlemail.com wrote: Michael Kao mkao006rmail at gmail.com writes: Well, taking a second look, I'd say it depends on the exact formulation. In the applications I have in mind, I would like to count each occurrence in B only once. Perhaps the OP never thought about duplicates in B Hans Werner Here is an example based on the duplicated function test.mat1 - matrix(1:20, nc = 5) test.mat2 - rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5)) compMat - function(mat1, mat2){ nr1 - nrow(mat1) nr2 - nrow(mat2) mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ] } compMat(test.mat1, test.mat2) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] order function give back row name
names(results)[order(results)] Michael On Fri, Dec 2, 2011 at 2:45 PM, Martin Bauer bauermar...@gmx.at wrote: Hello, I have a matrix results with dimension 1x9 double matrix XLB XLE XLF XLI 1 53.3089 55.77923 37.64458 83.08646 I'm trying to order this matrix print(order(results)) [1] 3 1 2 4 how can the function order return the columnname XLF XLB XLE XLI instead of 3 1 2 4 any idea ? Thank you in advance -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] order function give back row name
With similar data, since you didn't include reproducible example of your own: results - matrix(c(53, 55, 37, 83), nrow=1) colnames(results) - letters[1:4] results a b c d [1,] 53 55 37 83 order(results) [1] 3 1 2 4 colnames(results)[order(results)] [1] c a b d On Fri, Dec 2, 2011 at 2:45 PM, Martin Bauer bauermar...@gmx.at wrote: Hello, I have a matrix results with dimension 1x9 double matrix XLB XLE XLF XLI 1 53.3089 55.77923 37.64458 83.08646 I'm trying to order this matrix print(order(results)) [1] 3 1 2 4 how can the function order return the columnname XLF XLB XLE XLI instead of 3 1 2 4 any idea ? Thank you in advance -- -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summarizing elements of a list
Here's a slight modification that is even faster if speed is a consideration: sapply(Version1_, `[[`, First) The thought process is to go through the list Version1_ and apply the operation `[[` to each element individually. This requires a second operator (here the element name First) which we pass through the ... of sapply() -- I hope that helps you get a sense of the mechanics. We use sapply() instead of lapply() because it does some internal simplification for us to get one big vector back, effectively cutting out the unlist of the first solution I gave you. Michael On Fri, Dec 2, 2011 at 2:04 PM, LCOG1 jr...@lcog.org wrote: Great, this worked the fastest of all the suggestions. Cheers, Josh From: Michael Weylandt [via R] [mailto:ml-node+s789695n414494...@n4.nabble.com] Sent: Thursday, December 01, 2011 8:11 PM To: ROLL Josh F Subject: Re: Summarizing elements of a list Similarly, this might work: unlist(lapply(Version1_, `[`,First)) Michael On Thu, Dec 1, 2011 at 9:41 PM, Sarah Goslee [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=0 wrote: How about: lapply(Version1_, subset, subset=c(TRUE, FALSE)) or sapply() depending on what you want the result to look like. Thanks for the reproducible example. Sarah On Thu, Dec 1, 2011 at 5:17 PM, LCOG1 [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=1 wrote: Hi everyone, I looked around the list for a while but couldn't find a solution to my problem. I am storing some results to a simulation in a list and for each element i have two separate vectors(is that what they are called, correct my vocab if necessary). See below Version1_-list() for(i in 1:5){ Version1_[[i]]-list(First=rnorm(1),Second=rnorm(1)) } What I want is to put all of the elements' 'First' vectors into a single list to box plot. But whats a more elegant solution to the below? c(Version1_[[1]]$First,Version1_[[2]]$First,Version1_[[3]]$First,Version1_[[4]]$First,Version1_[[5]]$First) since i have 50 or more simulations this is impractical and sloppy. Do I need to store my data differently or is their a solution on the back end? Thanks all. Josh -- Sarah Goslee http://www.functionaldiversity.org __ [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=2 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email]/user/SendEmail.jtp?type=nodenode=4144941i=3 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4144941.html To unsubscribe from Summarizing elements of a list, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4142479code=anJvbGxAbGNvZy5vcmd8NDE0MjQ3OXwtMTcwMzUwNjI0Mg==. NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/Summarizing-elements-of-a-list-tp4142479p4148571.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RFE: vectorized behavior for as.POSIXct tz argument
On Dec 2, 2011, at 2:28 PM, Jack Tanner wrote: x - 1472562988 + 1:10; tz - rep(EST,10) # Case 1: Works as documented ct - as.POSIXct(x, tz=tz[1], origin=1960-01-01) # Case 2: Fails ct - as.POSIXct(x, tz=tz, origin=1960-01-01) sapply(tz, function(ttt) as.POSIXct(x=x, tz=ttt, origin=1960-01-01),simplify=FALSE) If case 2 worked, it'd be a little easier to process paired (time, time zone) vectors from different time zones. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot and polygon in log scale
Dear Experts, When using plot and polygon, I can change the density and angle of the shaded area lines when plotting is done in regular scale. It does not seem to work in 'log' scale. Any suggestions would be highly appreciated! below is an example: plot(1:10,c(1:10)^2*20,log=y) polygon(c(3:7,7:3),c((3:7)^2*20,c(7:3)^2*10),col='grey',angle=45,dens=30) Warning message: In polygon.fullhatch(xy$x[start:(end - 1)], xy$y[start:(end - 1)], : cannot hatch with logarithmic scale active Regards, Santosh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RFE: vectorized behavior for as.POSIXct tz argument
David Winsemius dwinsemius at comcast.net writes: sapply(tz, function(ttt) as.POSIXct(x=x, tz=ttt, origin=1960-01-01),simplify=FALSE) Sure, there's no end of workarounds. It would just be consistent to treat both the x and the tz arguments as vectors. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RFE: vectorized behavior for as.POSIXct tz argument
On Dec 2, 2011, at 4:06 PM, Jack Tanner wrote: David Winsemius dwinsemius at comcast.net writes: sapply(tz, function(ttt) as.POSIXct(x=x, tz=ttt, origin=1960-01-01),simplify=FALSE) Sure, there's no end of workarounds. It would just be consistent to treat both the x and the tz arguments as vectors. I've wondered abut that too. The function where I would like to see a dual vectorized application is 'rep'. In cases where the x argument is the same length as the 'times' or 'each' arguments I would like to see it produce a vector that is sum(each) or tume(times) long. The problem is most likely in the ambiguity of how to apply the arguments: unlist(sapply(1:5, function(tt) rep(1:5, each=tt))) [1] 1 2 3 4 5 1 1 2 2 3 3 4 4 5 5 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 1 1 1 1 2 2 2 2 3 [40] 3 3 3 4 4 4 4 5 5 5 5 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 mapply(rep, x=1:5, each=1:5) [[1]] [1] 1 [[2]] [1] 2 2 [[3]] [1] 3 3 3 [[4]] [1] 4 4 4 4 [[5]] [1] 5 5 5 5 5 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave problem on Mac OS when using umlauts and summary()
I have the following Sweave file which gets sweaved correctly. = m - lm(y1 ~x1, anscombe) summary(m) @ I include the sweaved .tex file into another .tex file via include. When I use a single umlaut in the .snw file a warning occurs. As a result part of the summary output is not contained in the .tex file. ä = m - lm(y1 ~x1, anscombe) summary(m) @ You can now run (pdf)latex on 'ch1.tex' Warnmeldungen: 1: ch1.Snw has unknown encoding: assuming Latin-1 2: ungültige Zeichenkette in Konvertierung der Ausgabe (wrong character in conversion of output) Interestingly, this error does NOT occur, when I omit the summary(m) statement. ä = m - lm(y1 ~x1, anscombe) #summary(m) @ You can now run (pdf)latex on 'ch1.tex' Warnmeldung: ch1.Snw has unknown encoding: assuming Latin-1 I know that I can prevent this by adding a line at the beginning of the .snw file: \usepackage[utf8]{inputenc} ä = m - lm(y1 ~x1, anscombe) summary(m) @ This gets sweaved correctly without warnings: But this solution is not good as it is not the preamble of the .tex document where I add the usepackage line. This will cause an error when processing the entire document with tex. How can I achieve the last result in another way? I tried: Sweave('/Users/markheckmann/Desktop/test_sweave/ch1.Snw', encoding=UFT-8) But this does not work either when the usepackage line is omitted. I am stuck here. Can anyone help? TIA Mark Mark Heckmann Blog: www.markheckmann.de R-Blog: http://ryouready.wordpress.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fw: calculate mean of multiple rows in a data frame
Thank you, I copied the data from the R environment, but it came out wrong. You understood exactly what I wanted, and your solution is admirable: I clearly need to address the naming convention. Thanks for your help. --- On Fri, 2/12/11, Jean V Adams jvad...@usgs.gov wrote: From: Jean V Adams jvad...@usgs.gov Subject: Re: [R] Fw: calculate mean of multiple rows in a data frame To: Jabez Wilson jabez...@yahoo.co.uk Cc: R-Help r-h...@stat.math.ethz.ch Date: Friday, 2 December, 2011, 14:29 It's easier for folks to help you if you put your example data in a format that can be readily read in R. See, for example, the dput() function, which you can use to provide us with something like this: DF - structure(list(NAME = c(Control_1, Control_2, Control_1, Control_3, MM0289~RFU:11810.15, MM0289~RFU:9238.41, MM16597~RFU:36765.38, MM16597~RFU:41258.94), ID = c(probe~B01R01C01, probe~B01R01C02, probe~B01R09C01, probe~B01R09C02, probe~B29R13C06, probe~B29R13C05, probe~B44R15C20, probe~B44R15C19), a = c(3L, 712L, 937L, 464L, 99L, 605L, 700L, 132L), b = c(22L, 13L, 824L, 836L, 544L, 603L, 923L, 777L), c = c(926L, 32L, 898L, 508L, 607L, 862L, 219L, 497L), d = c(774L, 179L, 668L, 53L, 984L, 575L, 582L, 995L)), .Names = c(NAME, ID, a, b, c, d), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8)) If I understand what you're after, you want to summarize data within groups, but your NAME variable is not as general as you would like. You can get around this by creating a new variable which is a shorter and more general version of the NAME variable. I did this by saving just the part of the NAME before the colon, :. shortname - sapply(strsplit(DF$NAME, :), [, 1) aggregate(DF[, -(1:2)], by=list(shortname=shortname), mean) shortname a b c d 1 Control_1 470 423.0 912.0 721.0 2 Control_2 712 13.0 32.0 179.0 3 Control_3 464 836.0 508.0 53.0 4 MM0289~RFU 352 573.5 734.5 779.5 5 MM16597~RFU 416 850.0 358.0 788.5 Jean Jabez Wilson wrote on 12/01/2011 03:15:39 PM: NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 Sorry, that should look like this: NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 NAME ID a b c d 1 Control_1 probe~B01R01C01 3 22 926 774 2 Control_2 probe~B01R01C02 712 13 32 179 3 Control_1 probe~B01R09C01 937 824 898 668 4 Control_3 probe~B01R09C02 464 836 508 53 5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995 --- On Thu, 1/12/11, Jabez Wilson jabez...@yahoo.co.uk wrote: From: Jabez Wilson jabez...@yahoo.co.uk Subject: calculate mean of multiple rows in a data frame To: R-Help r-h...@stat.math.ethz.ch Date: Thursday, 1 December, 2011, 20:45 Dear all, I have a data frame (DF) in the following format: NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686. I would like to consolidate the data frame by parsing through the rows, and where the NAME is identical, consolidate into one row and return the mean. I can do this for the first lines (Control_1 etc) by using aggregate()
[R] Imputing data
So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na-pc pc.roughfix - na.roughfix(pc.na) pc.narf - randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or some type of correlation. Any help would be appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Imputing-data-tp4150041p4150041.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Writing data including NAs to access using RODBC
Hi, I have run into a problem writing data using RODBC. The dataframe i have read in from access includes some NAs. I have put the data into an xts object, manipulated the data, and would now like to append two columns of the manipulated data to the original table in access. I cannot append the data, nor write a new table. After some fiddling about i think that it is that the vectors i wish to append to the original dataframe /write include some NAs. Is there a work around? Thanks Matt Johnson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave problem on Mac OS when using umlauts and summary()
This problem comes up so frequently that I have made options(useFancyQuotes=FALSE) by default in my knitr package: http://yihui.github.com/knitr/ You can also use options(useFancyQuotes='TeX'). Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Fri, Dec 2, 2011 at 4:08 PM, Mark Heckmann mark.heckm...@gmx.de wrote: I have the following Sweave file which gets sweaved correctly. = m - lm(y1 ~x1, anscombe) summary(m) @ I include the sweaved .tex file into another .tex file via include. When I use a single umlaut in the .snw file a warning occurs. As a result part of the summary output is not contained in the .tex file. ä = m - lm(y1 ~x1, anscombe) summary(m) @ You can now run (pdf)latex on 'ch1.tex' Warnmeldungen: 1: ‘ch1.Snw’ has unknown encoding: assuming Latin-1 2: ungültige Zeichenkette in Konvertierung der Ausgabe (wrong character in conversion of output) Interestingly, this error does NOT occur, when I omit the summary(m) statement. ä = m - lm(y1 ~x1, anscombe) #summary(m) @ You can now run (pdf)latex on 'ch1.tex' Warnmeldung: ‘ch1.Snw’ has unknown encoding: assuming Latin-1 I know that I can prevent this by adding a line at the beginning of the .snw file: \usepackage[utf8]{inputenc} ä = m - lm(y1 ~x1, anscombe) summary(m) @ This gets sweaved correctly without warnings: But this solution is not good as it is not the preamble of the .tex document where I add the usepackage line. This will cause an error when processing the entire document with tex. How can I achieve the last result in another way? I tried: Sweave('/Users/markheckmann/Desktop/test_sweave/ch1.Snw', encoding=UFT-8) But this does not work either when the usepackage line is omitted. I am stuck here. Can anyone help? TIA Mark Mark Heckmann Blog: www.markheckmann.de R-Blog: http://ryouready.wordpress.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imputing data
On Fri, Dec 2, 2011 at 2:16 PM, khlam kh...@ucsc.edu wrote: So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na-pc pc.roughfix - na.roughfix(pc.na) pc.narf - randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or some type of correlation. Any help would be appreciated. There are several imputation functions available in the various packages - for example, packages Hmisc and e1071 both contain a function called impute, and the package impute contains the function impute.knn for nearest neighbor imputation. HTH, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R, PostgresSQL and poor performance
On 12/02/2011 09:46 PM, Berry, David I. wrote: Thanks for the reply and suggestions. I've tried the RpgSQL drivers and the results are pretty similar in terms of performance. The ~1.5M records I'm trying to read into R are being extracted from a table with ~300M rows (and ~60 columns) that has been indexed on the relevant columns and horizontally partitioned (with constraint checking on). I do need to try and optimize the database a bit more but I don¹t think this is the cause of the performance issues. With that much data you might want to consider PL/R: http://www.joeconway.com/plr/ HTH, Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imputing data
Hi, For imputation using randomForest package, check ?rfImpute Weidong On Fri, Dec 2, 2011 at 6:00 PM, Peter Langfelder peter.langfel...@gmail.com wrote: On Fri, Dec 2, 2011 at 2:16 PM, khlam kh...@ucsc.edu wrote: So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na-pc pc.roughfix - na.roughfix(pc.na) pc.narf - randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or some type of correlation. Any help would be appreciated. There are several imputation functions available in the various packages - for example, packages Hmisc and e1071 both contain a function called impute, and the package impute contains the function impute.knn for nearest neighbor imputation. HTH, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot and polygon in log scale
On 2011-12-02 13:03, Santosh wrote: Dear Experts, When using plot and polygon, I can change the density and angle of the shaded area lines when plotting is done in regular scale. It does not seem to work in 'log' scale. Any suggestions would be highly appreciated! below is an example: plot(1:10,c(1:10)^2*20,log=y) polygon(c(3:7,7:3),c((3:7)^2*20,c(7:3)^2*10),col='grey',angle=45,dens=30) Warning message: In polygon.fullhatch(xy$x[start:(end - 1)], xy$y[start:(end - 1)], : cannot hatch with logarithmic scale active Regards, Santosh It looks like density is not implemented for log scales. (There is a comment in the source file.) Perhaps a note in the help file might be useful, but I would think that nowadays most users would want colour or shades of gray anyway. Of course, you can always do the logging on the data before plotting. You'll just have to use the axis() function to print appropriate axis labels. Peter Ehlers [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help! Big problem when using browser() to do R debugging?
Hi all, Could you please help me? I am having the following weird problem when debugging R programs using browser(): In my function, I've inserted a browser() in front of Step 1. My function has 3 steps and at the end of each step, it will print out the message Step i is done... However, after I hit ENTER when the program stopped before Step 1 and entered into the debugging mode, it not only executed the next line(i.e. the Step 1), but also all the (many) remaining lines in that function, as shown below: Browse[1] [1] Step 1 is done.. [1] Step 2 is done.. [1] Step 3 is done.. Then it automatically quited the debugging mode and when I tried to check the value of myobj, I've got the following error message: names(myobj) Error: object 'myobj' not found No suitable frames for recover() So my question is: why did one key stroke ENTER lead it to execute all the remaining lines in that function and then returned from the function and quited the debugging mode? Thanks a lot! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help! Big problem when using browser() to do R debugging?
On 11-12-02 8:38 PM, Michael wrote: Hi all, Could you please help me? I am having the following weird problem when debugging R programs using browser(): In my function, I've inserted a browser() in front of Step 1. My function has 3 steps and at the end of each step, it will print out the message Step i is done... However, after I hitENTER when the program stopped before Step 1 and entered into the debugging mode, it not only executed the next line(i.e. the Step 1), but also all the (many) remaining lines in that function, as shown below: Browse[1] [1] Step 1 is done.. [1] Step 2 is done.. [1] Step 3 is done.. Then it automatically quited the debugging mode and when I tried to check the value of myobj, I've got the following error message: names(myobj) Error: object 'myobj' not found No suitable frames for recover() So my question is: why did one key strokeENTER lead it to execute all the remaining lines in that function and then returned from the function and quited the debugging mode? See ?browser. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] side-by-side map with different geographies using spplot
Hello, I want to create side-by-side maps of similar attribute data in two different cities using a single legend. To simply display side-by-side census block group boundary (non-thematic) maps for Minneapolis Cleveland I do the following: library(rgdal) library(sp) Minneapolis=readOGR(../Minneapolis/Census/2010/Census_BlockGroup_GEO/,tl_2010_27053_bg10) Cleveland=readOGR(../Cleveland/Census/2010/Census_BlockGroup_GEO/,tl_2010_39035_bg10) par(mfrow=c(1,2)) plot(Minneapolis) plot(Cleveland) I can display a single thematic map for a city using spplot as follows: spplot(Minneapolis,Thematic_Data_Column) But, calling the function again for Cleveland just overwrites the window. I am unsure how to use spplot's layout tools with two different geographies. Most examples use a single geography and multiple attribute columns. Alternatively, is there a way to use par together with spplot to allow for multiple spplot calls? thank you, -david __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple lm question
Use `lm` the way it is designed to be used, with a data argument: l2 - lm(e~. , data=as.data.frame(M)) summary(l2) Call: lm(formula = e ~ ., data = as.data.frame(M)) And what is the regression being done in this case? How are the independent variables used? It looks like M[,5]~M[,1]+M[,2]+M[,3]+M[,4] as those are the coefficients. But the results are different when I do that explicitly: M - matrix(runif(5*20), nrow=20) colnames(M) - c('a', 'b', 'c', 'd', 'e') l1 - lm(df[,'e']~., data=df) summary(l1) Call: lm(formula = df[, e] ~ ., data = df) Residuals: Min 1Q Median 3QMax -9.580e-17 -3.360e-17 -8.596e-18 9.114e-18 2.032e-16 Coefficients: Estimate Std. Errort value Pr(|t|) (Intercept) -7.505e-17 7.158e-17 -1.048e+000.312 a -1.653e-17 7.117e-17 -2.320e-010.820 b -5.042e-17 5.480e-17 -9.200e-010.373 c4.236e-17 5.774e-17 7.340e-010.475 d -3.878e-17 4.946e-17 -7.840e-010.446 e1.000e+00 6.083e-17 1.644e+16 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 6.763e-17 on 14 degrees of freedom Multiple R-squared: 1,Adjusted R-squared: 1 F-statistic: 6.435e+31 on 5 and 14 DF, p-value: 2.2e-16 l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4]) summary(l3) Call: lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4]) Residuals: Min 1Q Median 3Q Max -0.49398 -0.14203 0.01588 0.14157 0.31335 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.6681 0.1859 3.594 0.00266 ** M[, 1] -0.1767 0.2419 -0.730 0.47644 M[, 2] -0.3874 0.2135 -1.814 0.08970 . M[, 3]0.3695 0.2180 1.695 0.11078 M[, 4]0.1361 0.2366 0.575 0.57360 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.2449 on 15 degrees of freedom Multiple R-squared: 0.2988,Adjusted R-squared: 0.1119 F-statistic: 1.598 on 4 and 15 DF, p-value: 0.2261 cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple lm question
In your code by supplying a vector M[,e] you are regressing e against all the variables provided in the data argument, including e itself -- this gives the very strange regression coefficients you observe. R has no way to know that that's somehow related to the e it sees in the data argument. In the suggested way, lm(formula = e ~ ., data = as.data.frame(M)) e is regressed against everything that is not e and sensible results are given. Michael On Fri, Dec 2, 2011 at 11:03 PM, Worik R wor...@gmail.com wrote: Use `lm` the way it is designed to be used, with a data argument: l2 - lm(e~. , data=as.data.frame(M)) summary(l2) Call: lm(formula = e ~ ., data = as.data.frame(M)) And what is the regression being done in this case? How are the independent variables used? It looks like M[,5]~M[,1]+M[,2]+M[,3]+M[,4] as those are the coefficients. But the results are different when I do that explicitly: M - matrix(runif(5*20), nrow=20) colnames(M) - c('a', 'b', 'c', 'd', 'e') l1 - lm(df[,'e']~., data=df) summary(l1) Call: lm(formula = df[, e] ~ ., data = df) Residuals: Min 1Q Median 3Q Max -9.580e-17 -3.360e-17 -8.596e-18 9.114e-18 2.032e-16 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -7.505e-17 7.158e-17 -1.048e+00 0.312 a -1.653e-17 7.117e-17 -2.320e-01 0.820 b -5.042e-17 5.480e-17 -9.200e-01 0.373 c 4.236e-17 5.774e-17 7.340e-01 0.475 d -3.878e-17 4.946e-17 -7.840e-01 0.446 e 1.000e+00 6.083e-17 1.644e+16 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 6.763e-17 on 14 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 6.435e+31 on 5 and 14 DF, p-value: 2.2e-16 l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4]) summary(l3) Call: lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4]) Residuals: Min 1Q Median 3Q Max -0.49398 -0.14203 0.01588 0.14157 0.31335 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.6681 0.1859 3.594 0.00266 ** M[, 1] -0.1767 0.2419 -0.730 0.47644 M[, 2] -0.3874 0.2135 -1.814 0.08970 . M[, 3] 0.3695 0.2180 1.695 0.11078 M[, 4] 0.1361 0.2366 0.575 0.57360 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2449 on 15 degrees of freedom Multiple R-squared: 0.2988, Adjusted R-squared: 0.1119 F-statistic: 1.598 on 4 and 15 DF, p-value: 0.2261 cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple lm question
Duh! Silly me! But my confusion persits: What is the regression being done? See below On Sat, Dec 3, 2011 at 5:10 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: In your code by supplying a vector M[,e] you are regressing e against all the variables provided in the data argument, including e itself -- this gives the very strange regression coefficients you observe. R has no way to know that that's somehow related to the e it sees in the data argument. In the suggested way, lm(formula = e ~ ., data = as.data.frame(M)) e is regressed against everything that is not e and sensible results are given. But still 'l1 - lm(e~., data=df)' is not the same as 'l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])' M - matrix(runif(5*20), nrow=20) colnames(M) - c('a', 'b', 'c', 'd', 'e') l1 - lm(e~., data=df) summary(l1) Call: lm(formula = e ~ ., data = df) Residuals: Min 1Q Median 3Q Max -0.38343 -0.21367 0.03067 0.13757 0.49080 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.285210.29477 0.9680.349 a0.092830.30112 0.3080.762 b0.239210.22425 1.0670.303 c -0.160270.24154 -0.6640.517 d0.240250.20054 1.1980.250 Residual standard error: 0.2871 on 15 degrees of freedom Multiple R-squared: 0.1602,Adjusted R-squared: -0.06375 F-statistic: 0.7153 on 4 and 15 DF, p-value: 0.5943 l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4]) summary(l3) Call: lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4]) Residuals: Min 1Q Median 3Q Max -0.36355 -0.22679 -0.01202 0.18462 0.37377 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.769720.24501 3.142 0.00672 ** M[, 1] -0.238300.24123 -0.988 0.33890 M[, 2] -0.020460.21958 -0.093 0.92699 M[, 3] -0.295180.22559 -1.308 0.21040 M[, 4] -0.315450.24570 -1.284 0.21866 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.2668 on 15 degrees of freedom Multiple R-squared: 0.2762,Adjusted R-squared: 0.08317 F-statistic: 1.431 on 4 and 15 DF, p-value: 0.272 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple lm question
On Dec 2, 2011, at 11:20 PM, Worik R wrote: Duh! Silly me! But my confusion persits: What is the regression being done? See below Sigh Please note that your df and M are undoubtedly different objects by now: M - matrix(runif(5*20), nrow=20) colnames(M) - c('a', 'b', 'c', 'd', 'e') l1 - lm(e~., data=as.data.frame(M)) l1 Call: lm(formula = e ~ ., data = as.data.frame(M)) Coefficients: (Intercept)abcd 0.40139 -0.15032 -0.06242 0.13139 0.23905 l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4]) l3 Call: lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4]) Coefficients: (Intercept) M[, 1] M[, 2] M[, 3] M[, 4] 0.40139 -0.15032 -0.06242 0.13139 0.23905 As expected. -- David. On Sat, Dec 3, 2011 at 5:10 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: In your code by supplying a vector M[,e] you are regressing e against all the variables provided in the data argument, including e itself -- this gives the very strange regression coefficients you observe. R has no way to know that that's somehow related to the e it sees in the data argument. In the suggested way, lm(formula = e ~ ., data = as.data.frame(M)) e is regressed against everything that is not e and sensible results are given. But still 'l1 - lm(e~., data=df)' is not the same as 'l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])' M - matrix(runif(5*20), nrow=20) colnames(M) - c('a', 'b', 'c', 'd', 'e') l1 - lm(e~., data=df) summary(l1) Call: lm(formula = e ~ ., data = df) Residuals: Min 1Q Median 3Q Max -0.38343 -0.21367 0.03067 0.13757 0.49080 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.285210.29477 0.9680.349 a0.092830.30112 0.3080.762 b0.239210.22425 1.0670.303 c -0.160270.24154 -0.6640.517 d0.240250.20054 1.1980.250 Residual standard error: 0.2871 on 15 degrees of freedom Multiple R-squared: 0.1602,Adjusted R-squared: -0.06375 F-statistic: 0.7153 on 4 and 15 DF, p-value: 0.5943 l3 - lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4]) summary(l3) Call: lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4]) Residuals: Min 1Q Median 3Q Max -0.36355 -0.22679 -0.01202 0.18462 0.37377 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.769720.24501 3.142 0.00672 ** M[, 1] -0.238300.24123 -0.988 0.33890 M[, 2] -0.020460.21958 -0.093 0.92699 M[, 3] -0.295180.22559 -1.308 0.21040 M[, 4] -0.315450.24570 -1.284 0.21866 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.2668 on 15 degrees of freedom Multiple R-squared: 0.2762,Adjusted R-squared: 0.08317 F-statistic: 1.431 on 4 and 15 DF, p-value: 0.272 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.