Re: [R] selecting significant predictors from ANOVA result
Dear Sir, Thanks for your message. My problem is in writing codes. I did ANOVA for 75000 response variables (let's say Y) with 243 predictors (let's say X-matrix) one by one with for loop in R. I stored the p-values of all predictors, however, i have very huge file because i have pvalues of 243 predictors for all 75000 Y-variables. Now, i want to find some codes that autamatically select only significant X-predictors from the whole list. If you have ideas on that, it will be great help. Thanks in advances Sincerely, Ram --- On Wed, 1/27/10, Bert Gunter gunter.ber...@gene.com wrote: From: Bert Gunter gunter.ber...@gene.com Subject: RE: [R] selecting significant predictors from ANOVA result To: 'ram basnet' basnet...@yahoo.com, 'R help' r-help@r-project.org Date: Wednesday, January 27, 2010, 7:56 AM Ram: You do not say how many cases (rows in your dataset) you have, but I suspect it may be small (a few hundred, say). In any case, what you describe is probably just a complicated way to generate random numbers -- it is **highly** unlikely that any meaningful, replicable scientific results would result from your proposed approach. Not surprising -- this appears to be a very difficult data analysis issue. It is obvious that you have only a minimal statistical background, so I would strongly recommend that you find a competent local statistician to help you with your work. Remote help from this list is wholly inadequate. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of ram basnet Sent: Wednesday, January 27, 2010 2:52 AM To: R help Subject: [R] selecting significant predictors from ANOVA result Dear all, I did ANOVA for many response variables (Var1, Var2, Var75000), and i got the result of p-value like below. Now, I want to select those predictors, which have pvalue less than or equal to 0.05 for each response variable. For example, X1, X2, X3, X4, X5 and X6 in case of Var1, and similarly, X1, X2...X5 in case of Var2, only X1 in case of Var3 and none of the predictors in case of Var4. predictors Var1 Var2 Var3 Var4 X1 0.5 0.001 0.05 0.36 X2 0.0001 0.001 0.09 0.37 X3 0.0002 0.005 0.13 0.38 X4 0.0003 0.01 0.17 0.39 X5 0.01 0.05 0.21 0.4 X6 0.05 0.0455 0.25 0.41 X7 0.038063 0.0562 0.29 0.42 X8 0.04605 0.0669 0.33 0.43 X9 0.054038 0.0776 0.37 0.44 X10 0.062025 0.0883 0.41 0.45 I have very large data sets (# of response variables = ~75,000). So, i need some kind of automated procedure. But i have no ideas. If i got help from some body, it will be great for me. Thanks in advance. Sincerely, Ram Kumar Basnet, Ph. D student Wageningen University, The Netherlands. [[alternative HTML version deleted]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge: sort=F not preserving order?
You could add an extra sequence on the dataframe you wish to sort on. Merge together, sort by the sequence, delete the sequence. It's a bit more work, but it will give you what you want. Bart -- View this message in context: http://n4.nabble.com/Merge-sort-F-not-preserving-order-tp1312234p1340790.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] number of decimal
Hi everybody, I'm trying to set the number of decimals (i.e. the number of digits after the .). I looked into options but I can only set the total number of digits, with options(digits=6). But since I have different variables with different order of magnitude, I would like that they're all displayed with the same number of decimals. I searched for it and found the format() function, with nsmall=6, but it is for a given vector. I would like to set it for the whole session, as with options. Can anyone help me? Thanks in advance Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with fitdistr
Hi, I want to estimate parameters of weibull distribution. For this, I am using fitdistr() function in MASS package.But when I give fitdistr(c,weibull) I get a Error as follows:- Error in optim(x = c(4L, 41L, 20L, 6L, 12L, 6L, 7L, 13L, 2L, 8L, 22L, : non-finite value supplied by optim Any help or suggestions are most welcomed -- View this message in context: http://n4.nabble.com/Problems-with-fitdistr-tp1334772p1334772.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] If then test
close, So I have a vector, lets say [1] 1.5 1.2 And a matrix [,1] [,2] [1,] 1.9 1.3 [2,]-.2 2 I want to somehow use the first number in my vector(1.5) and compare this number to my whole first column. So I want to see how many times the numbers in column 11.5 which should be 1 in this case. Now for the other number, we compare 1.2. We get 0. So I need a vector to have these results like [1] 1 0 -- View this message in context: http://n4.nabble.com/If-then-test-tp1322119p1336898.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data.frame manipulation
Thank you Dennis--this is perfect!! AC On Thu, Jan 28, 2010 at 12:24 AM, Dennis Murphy djmu...@gmail.com wrote: Hi: There are several ways to do this, but these are the most commonly used: aggregate() and the ddply() function in package plyr. (1) plyr solution (using x as the name of your input data frame): library(plyr) ddply(x, .(id, mod1), summarize, es = mean(es)) id mod1 es 1 12 0.30 2 24 0.15 3 31 0.20 ddply(x, .(id, mod1, mod2), summarize, es = mean(es)) id mod1 mod2 es 1 12wai 0.30 2 24 calpas 0.20 3 24 other 0.10 4 31 itas 0.10 5 31wai 0.25 (2) aggregate() function in base R: with(x, aggregate(list(es = es), by = list(id = id, mod1 = mod1), mean)) id mod1 es 1 31 0.20 2 12 0.30 3 24 0.15 with(x, aggregate(list(es = es), by = list(id = id, mod1 = mod1, mod2 = mod2), + mean)) id mod1 mod2 es 1 24 calpas 0.20 2 31 itas 0.10 3 24 other 0.10 4 31wai 0.25 5 12wai 0.30 Note that enclosing the variable names in lists and 'equating' them maintains the variable name in the output. Here's what happens if you don't: with(x, aggregate(es, list(id, mod1), mean)) Group.1 Group.2x 1 3 1 0.20 2 1 2 0.30 3 2 4 0.15 ddply() is a little less painless and sorts the output for you automatically. HTH, Dennis On Wed, Jan 27, 2010 at 7:34 PM, AC Del Re acde...@gmail.com wrote: Hi All, I'm conducting a meta-analysis and have taken a data.frame with multiple rows per study (for each effect size) and performed a weighted average of effect size for each study. This results in a reduced # of rows. I am particularly interested in simply reducing the additional variables in the data.frame to the first row of the corresponding id variable. For example: id-c(1,2,2,3,3,3) es-c(.3,.1,.3,.1,.2,.3) mod1-c(2,4,4,1,1,1) mod2-c(wai,other,calpas,wai,itas,other) data-as.data.frame(cbind(id,es,mod1,mod2)) data id esmod1 mod2 1 1 0.32 wai 2 2 0.14 other 3 2 0.24 calpas 4 3 0.11 itas 5 3 0.21 wai 6 3 0.31 wai # I would like to reduce the entire data.frame like this: id es mod1 mod2 1 .30 2wai 2 .15 4other 3 .20 1 itas # If possible, I would also like the option of this (collapsing on id and mod2): id es mod1 mod2 1 .30 2wai 2 0.1 4 other 2 0.2 4calpas 3 0.1 1 itas 3 0.251 wai Any help is much appreciated! AC Del Re [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting significant predictors from ANOVA result
Hi I agree with Bert that what you want to do is, how to say it politely, OK, not reasonable. If p value is significant depends on number of observations. Let assume that they are same for each p value. Then you need your p values in suitable object which you did not reveal to us. Again I will assume that it is matrix 75000 x 243, let's call it mat. Then you can select elements smaller then some threshold. Here is a smaller one mat-matrix(runif(12),4,3) mat-mat/5 daf-as.data.frame(mat) daf V1 V2 V3 1 0.1833271959 0.182649428 0.16363889 2 0.1160545138 0.095533401 0.09378235 3 0.1622977912 0.005841073 0.08108027 4 0.0006527514 0.064333027 0.17431492 sapply(daf, function(x) x[x.1]) $V1 [1] 0.0006527514 $V2 [1] 0.095533401 0.005841073 0.064333027 $V3 [1] 0.09378235 0.08108027 But how do you control which of the significant values have real meaning and what you want to do with them is mystery. Regards Petr r-help-boun...@r-project.org napsal dne 28.01.2010 09:39:29: Dear Sir, Thanks for your message. My problem is in writing codes. I did ANOVA for 75000 response variables (let's say Y) with 243 predictors (let's say X-matrix) one by one with for loop in R. I stored the p-values of all predictors, however, i have very huge file because i have pvalues of 243 predictors for all 75000 Y-variables. Now, i want to find some codes that autamatically select only significant X- predictors from the whole list. If you have ideas on that, it will be great help. Thanks in advances Sincerely, Ram --- On Wed, 1/27/10, Bert Gunter gunter.ber...@gene.com wrote: From: Bert Gunter gunter.ber...@gene.com Subject: RE: [R] selecting significant predictors from ANOVA result To: 'ram basnet' basnet...@yahoo.com, 'R help' r-help@r-project.org Date: Wednesday, January 27, 2010, 7:56 AM Ram: You do not say how many cases (rows in your dataset) you have, but I suspect it may be small (a few hundred, say). In any case, what you describe is probably just a complicated way to generate random numbers -- it is **highly** unlikely that any meaningful, replicable scientific results would result from your proposed approach. Not surprising -- this appears to be a very difficult data analysis issue. It is obvious that you have only a minimal statistical background, so I would strongly recommend that you find a competent local statistician to help you with your work. Remote help from this list is wholly inadequate. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of ram basnet Sent: Wednesday, January 27, 2010 2:52 AM To: R help Subject: [R] selecting significant predictors from ANOVA result Dear all, I did ANOVA for many response variables (Var1, Var2, Var75000), and i got the result of p-value like below. Now, I want to select those predictors, which have pvalue less than or equal to 0.05 for each response variable. For example, X1, X2, X3, X4, X5 and X6 in case of Var1, and similarly, X1, X2...X5 in case of Var2, only X1 in case of Var3 and none of the predictors in case of Var4. predictors Var1 Var2 Var3 Var4 X1 0.5 0.001 0.05 0.36 X2 0.0001 0.001 0.09 0.37 X3 0.0002 0.005 0.13 0.38 X4 0.0003 0.01 0.17 0.39 X5 0.01 0.05 0.21 0.4 X6 0.05 0.0455 0.25 0.41 X7 0.038063 0.0562 0.29 0.42 X8 0.04605 0.0669 0.33 0.43 X9 0.054038 0.0776 0.37 0.44 X10 0.062025 0.0883 0.41 0.45 I have very large data sets (# of response variables = ~75,000). So, i need some kind of automated procedure. But i have no ideas. If i got help from some body, it will be great for me. Thanks in advance. Sincerely, Ram Kumar Basnet, Ph. D student Wageningen University, The Netherlands. [[alternative HTML version deleted]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NA Replacement by lowest value?
Hi all, I need to replace missing values in a matrix by 10 % of the lowest available value in the matrix. I've got a function I've used earlier to replace negative values by the lowest value, in a data frame, but I'm not sure how to modify it... nonNeg = as.data.frame(apply(orig.df, 2, function(col) # Change negative values to a small value, close to zero { min.val = min(col[col 0]) col[col 0] = (min.val / 10) col # Column index })) I think this is how to start, but the NA replacement part doesn't work... newMatrix = as.matrix(apply(oldMatrix, 2, function(col) { min.val = min(mData, na.rm = T) # Find the smallest value in the dataset col[col == NA] = (min.val / 10) # Doesn't work... col # Column index } Does any of you have any suggestions? Best regards, Joel _ Hitta kärleken i vinter! http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RMySQL - Bulk loading data and creating FK links
How it represents data internally is very important, depending on the real goal : http://en.wikipedia.org/wiki/Column-oriented_DBMS Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com... How it represents data internally should not be important as long as you can do what you want. SQL is declarative so you just specify what you want rather than how to get it and invisibly to the user it automatically draws up a query plan and then uses that plan to get the result. On Wed, Jan 27, 2010 at 12:48 PM, Matthew Dowle mdo...@mdowle.plus.com wrote: sqldf(select * from BOD order by Time desc limit 3) Exactly. SQL requires use of order by. It knows the order, but it isn't ordered. Thats not good, but might be fine, depending on what the real goal is. Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001270629w4795da89vb7d77af6e4e8b...@mail.gmail.com... On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How many columns, and of what type are the columns ? As Olga asked too, it would be useful to know more about what you're really trying to do. 3.5m rows is not actually that many rows, even for 32bit R. Its depends on the columns and what you want to do with those columns. At the risk of suggesting something before we know the full facts, one possibility is to load the data from flat file into data.table. Use setkey() to set your keys. Use tables() to summarise your various tables. Then do your joins etc all-in-R. data.table has fast ways to do those sorts of joins (but we need more info about your task). Alternatively, you could check out the sqldf website. There is an sqlread.csv (or similar name) which can read your files directly into SQL read.csv.sql instead of going via R. Gabor has some nice examples there about that and its faster. You use some buzzwords which makes me think that SQL may not be appropriate for your task though. Can't say for sure (because we don't have enough information) but its possible you are struggling because SQL has no row ordering concept built in. That might be why you've created an increment In the SQLite database it automatically assigns a self incrementing hidden column called rowid to each row. e.g. using SQLite via the sqldf package on CRAN and the BOD data frame which is built into R we can display the rowid column explicitly by referring to it in our select statement: library(sqldf) BOD Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 5 5 15.6 6 7 19.8 sqldf(select rowid, * from BOD) rowid Time demand 1 1 1 8.3 2 2 2 10.3 3 3 3 19.0 4 4 4 16.0 5 5 5 15.6 6 6 7 19.8 field? Do your queries include order by incrementing field? SQL is not good at first and last type logic. An all-in-R solution may well be In SQLite you can get the top 3 values, say, like this (continuing the prior example): sqldf(select * from BOD order by Time desc limit 3) Time demand 1 7 19.8 2 5 15.6 3 4 16.0 better, since R is very good with ordered vectors. A 1GB data.table (or data.frame) for example, at 3.5m rows, could have 76 integer columns, or 38 double columns. 1GB is well within 32bit and allows some space for working copies, depending on what you want to do with the data. If you have 38 or less columns, or you have 64bit, then an all-in-R solution *might* get your task done quicker, depending on what your real goal is. If this sounds plausible, you could post more details and, if its appropriate, and luck is on your side, someone might even sketch out how to do an all-in-R solution. Nathan S. Watson-Haigh nathan.watson-ha...@csiro.au wrote in message news:4b5fde1b.10...@csiro.au... I have a table (contact) with several fields and it's PK is an auto increment field. I'm bulk loading data to this table from files which if successful will be about 3.5million rows (approx 16000 rows per file). However, I have a linking table (an_contact) to resolve a m:m relationship between the an and contact tables. How can I retrieve the PK's for the data bulk loaded into contact so I can insert the relevant data into an_contact. I currently load the data into contact using: dbWriteTable(con, contact, dat, append=TRUE, row.names=FALSE) But I then need to get all the PK's which this dbWriteTable() appended to the contact table so I can load the data into my an_contact link table. I don't want to issue a separate INSERT query for each row in dat and then use MySQLs LAST_INSERT_ID() functionnot when I have 3.5million rows to insert! Any pointers welcome, Nathan -- Dr. Nathan S. Watson-Haigh OCE Post Doctoral Fellow CSIRO Livestock Industries University Drive Townsville, QLD 4810 Australia Tel: +61 (0)7 4753 8548 Fax: +61 (0)7 4753 8600 Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html
Re: [R] NA Replacement by lowest value?
Joel Fürstenberg-Hägg wrote: Hi all, I need to replace missing values in a matrix by 10 % of the lowest available value in the matrix. I've got a function I've used earlier to replace negative values by the lowest value, in a data frame, but I'm not sure how to modify it... nonNeg = as.data.frame(apply(orig.df, 2, function(col) # Change negative values to a small value, close to zero { min.val = min(col[col 0]) col[col 0] = (min.val / 10) col # Column index })) I think this is how to start, but the NA replacement part doesn't work... newMatrix = as.matrix(apply(oldMatrix, 2, function(col) { min.val = min(mData, na.rm = T) # Find the smallest value in the dataset col[col == NA] = (min.val / 10) # Doesn't work... use is.na(col) t find the NA's. cheers, Paul col # Column index } Does any of you have any suggestions? Best regards, Joel _ Hitta kärleken i vinter! http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lpSolve API - add Vs set
Hi, Using the package lpSolve API, I need to build a 2000*10 constraint matrix. I wonder which method is faster: (a) model = make.lp(0,0) add.constraint(model, ...) or (b) model = make.lp(2000,10) set.constraint(model,...) Thanks KC __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA Replacement by lowest value?
Thanks a lot Paul!! Best, Joel Date: Thu, 28 Jan 2010 10:48:37 +0100 From: p.hiems...@geo.uu.nl To: joel_furstenberg_h...@hotmail.com CC: r-help@r-project.org Subject: Re: [R] NA Replacement by lowest value? Joel Fürstenberg-Hägg wrote: Hi all, I need to replace missing values in a matrix by 10 % of the lowest available value in the matrix. I've got a function I've used earlier to replace negative values by the lowest value, in a data frame, but I'm not sure how to modify it... nonNeg = as.data.frame(apply(orig.df, 2, function(col) # Change negative values to a small value, close to zero { min.val = min(col[col 0]) col[col 0] = (min.val / 10) col # Column index })) I think this is how to start, but the NA replacement part doesn't work... newMatrix = as.matrix(apply(oldMatrix, 2, function(col) { min.val = min(mData, na.rm = T) # Find the smallest value in the dataset col[col == NA] = (min.val / 10) # Doesn't work... use is.na(col) t find the NA's. cheers, Paul col # Column index } Does any of you have any suggestions? Best regards, Joel _ Hitta kärleken i vinter! http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul _ Hitta hetaste singlarna på MSN Dejting! http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA Replacement by lowest value?
On 01/28/2010 08:35 PM, Joel Fürstenberg-Hägg wrote: Hi all, I need to replace missing values in a matrix by 10 % of the lowest available value in the matrix. I've got a function I've used earlier to replace negative values by the lowest value, in a data frame, but I'm not sure how to modify it... nonNeg = as.data.frame(apply(orig.df, 2, function(col) # Change negative values to a small value, close to zero { min.val = min(col[col 0]) col[col 0] = (min.val / 10) col # Column index })) I think this is how to start, but the NA replacement part doesn't work... newMatrix = as.matrix(apply(oldMatrix, 2, function(col) { min.val = min(mData, na.rm = T) # Find the smallest value in the dataset col[col == NA] = (min.val / 10) # Doesn't work... col # Column index } Does any of you have any suggestions? Hi Joel, You probably want to use: col[is.na(col)]-min.val/10 Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA Replacement by lowest value?
Hi Jim, That's what Pauls suggested too, works great! Best, Joel Date: Thu, 28 Jan 2010 20:57:57 +1100 From: j...@bitwrit.com.au To: joel_furstenberg_h...@hotmail.com CC: r-help@r-project.org Subject: Re: [R] NA Replacement by lowest value? On 01/28/2010 08:35 PM, Joel Fürstenberg-Hägg wrote: Hi all, I need to replace missing values in a matrix by 10 % of the lowest available value in the matrix. I've got a function I've used earlier to replace negative values by the lowest value, in a data frame, but I'm not sure how to modify it... nonNeg = as.data.frame(apply(orig.df, 2, function(col) # Change negative values to a small value, close to zero { min.val = min(col[col 0]) col[col 0] = (min.val / 10) col # Column index })) I think this is how to start, but the NA replacement part doesn't work... newMatrix = as.matrix(apply(oldMatrix, 2, function(col) { min.val = min(mData, na.rm = T) # Find the smallest value in the dataset col[col == NA] = (min.val / 10) # Doesn't work... col # Column index } Does any of you have any suggestions? Hi Joel, You probably want to use: col[is.na(col)]-min.val/10 Jim _ Hitta kärleken i vinter! http://dejting.se.msn.com/channel/index.aspx?trackingid=1002952 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] large integers in R
Hi Duncan, On Tue, Jan 26, 2010 at 9:09 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 26/01/2010 3:25 PM, Blanford, Glenn wrote: Has there been any update on R's handling large integers greater than 10^9 (between 10^9 and 4x10^9) ? as.integer() in R 2.9.2 lists this as a restriction but doesnt list the actual limit or cause, nor if anyone was looking at fixing it. Integers in R are 4 byte signed integers, so the upper limit is 2^31-1. That's not likely to change soon. But in the hypothetical scenario that this was to change soon and we were to have 64bit integer type (say, when under a 64 bit OS), wouldn't this allow us to have objects whose length exceeded the 2^31-1 limit? Benilton Carvalho The double type in R can hold exact integer values up to around 2^52. So for example calculations like this work fine: x - 2^50 y - x + 1 y-x [1] 1 Just don't ask R to put those values into a 4 byte integer, they won't fit: as.integer(c(x,y)) [1] NA NA Warning message: NAs introduced by coercion Duncan Murdoch Glenn D Blanford, PhD mailto:glenn.blanf...@us.army.mil Scientific Research Corporation gblanf...@scires.commailto:gblanf...@scires.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with fitdistr
Try to pass a start value to help optim (see ?fitdistr) Ciao! mario vikrant wrote: Hi, I want to estimate parameters of weibull distribution. For this, I am using fitdistr() function in MASS package.But when I give fitdistr(c,weibull) I get a Error as follows:- Error in optim(x = c(4L, 41L, 20L, 6L, 12L, 6L, 7L, 13L, 2L, 8L, 22L, : non-finite value supplied by optim Any help or suggestions are most welcomed -- Ing. Mario Valle Data Analysis and Visualization Group| http://www.cscs.ch/~mvalle Swiss National Supercomputing Centre (CSCS) | Tel: +41 (91) 610.82.60 v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] large integers in R
On 28/01/2010 5:30 AM, Benilton Carvalho wrote: Hi Duncan, On Tue, Jan 26, 2010 at 9:09 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 26/01/2010 3:25 PM, Blanford, Glenn wrote: Has there been any update on R's handling large integers greater than 10^9 (between 10^9 and 4x10^9) ? as.integer() in R 2.9.2 lists this as a restriction but doesnt list the actual limit or cause, nor if anyone was looking at fixing it. Integers in R are 4 byte signed integers, so the upper limit is 2^31-1. That's not likely to change soon. But in the hypothetical scenario that this was to change soon and we were to have 64bit integer type (say, when under a 64 bit OS), wouldn't this allow us to have objects whose length exceeded the 2^31-1 limit? Those are certainly related problems, but you don't need 64 bit integers to have longer vectors. We could switch to indexing by doubles in R (though internally the indexing would probably be done in 64 bit ints). A problem with exposing 64 bit ints in R is that they break the rule that doubles can represent any integer exactly. If x is an integer, x+1 is a double, and it would be unfortunate if (x+1) != (x+1L), as will happen with values bigger than 2^52. Duncan Murdoch Benilton Carvalho The double type in R can hold exact integer values up to around 2^52. So for example calculations like this work fine: x - 2^50 y - x + 1 y-x [1] 1 Just don't ask R to put those values into a 4 byte integer, they won't fit: as.integer(c(x,y)) [1] NA NA Warning message: NAs introduced by coercion Duncan Murdoch Glenn D Blanford, PhD mailto:glenn.blanf...@us.army.mil Scientific Research Corporation gblanf...@scires.commailto:gblanf...@scires.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maptools runs out of memory installing the help files for spCbind-methods
This has been seen on two ubuntu systems, but cannot be reproduced elsewhere - this is a first report for gentoo. The fix (found by Barry Rowlingson) is to install with R CMD INSTALL --no-latex maptools-blah.tar.gz rather than install.packages(), with the comment that perl was taking all available memory when --no-latex was omitted. As package maintainer, I can't reproduce this, as I have RHEL/f12 systems rather than Debian-based ones or indeed gentoo, which here is showing the same behaviour. If someone could offer me ssh access to a system with problems, I can try to see whether any of the text in that file or its successor is unpalatable. Roger - Roger Bivand Economic Geography Section Department of Economics Norwegian School of Economics and Business Administration Helleveien 30 N-5045 Bergen, Norway -- View this message in context: http://n4.nabble.com/Maptools-runs-out-of-memory-installing-the-help-files-for-spCbind-methods-tp1311062p1361079.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem associated with importing xlsx data file (Excel 2007)
On Jan 27, 2010, at 9:41 PM, Steven Kang wrote: Hi all, I have imported xlsx file (Excel 2007) into R using the following scripts. *library(RODBC) * *setwd(...) * *query - odbcConnectExcel2007(xls.file = GI 2010.xlsx, readOnly = TRUE) dat - sqlQuery(query, select * from [sheet1$], as.is = TRUE, na.strings = exp)* *dat* contain one column consisting of intergers and characters (unique value which is exp). However, R recognises the class of this column as 'numeric' instead of 'character' (i.e via sapply(dat, class)). In addition, all the values of this column that are supposed to be class of 'character' are presented as 'NA'. If the the vector is of type numeric then NO values in the vector(column) are supposed to be (or even can be) of type character. R does not have a mixed type vector. You have told sqlQuery that exp should be converted to NA at the time of input. Interestingly, when the file is saved in csv format and imported into R, this problem does not occur. Then that vector must be of character type. (So it's less interesting than you might have thought.) Any advice on this problem? Thank you as always. -- Steven [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional editing of rows in a data frame
Dear R users, I have a dataframe (main.table) with ~30,000 rows and 6 columns, of which here are a few rows: id chr window gene xp.normxp.top 129 1_32 1 32 TAS1R1 1.28882115 FALSE 130 1_32 1 32 ZBTB48 1.28882115 FALSE 131 1_32 1 32 KLHL21 1.28882115 FALSE 132 1_32 1 32PHF13 1.28882115 FALSE 133 1_33 1 33PHF13 1.02727430 FALSE 134 1_33 1 33THAP3 1.02727430 FALSE 135 1_33 1 33 DNAJC11 1.02727430 FALSE 136 1_33 1 33 CAMTA1 1.02727430 FALSE 137 1_34 1 34 CAMTA1 1.40312732 TRUE 138 1_35 1 35 CAMTA1 1.52104538 FALSE 139 1_36 1 36 CAMTA1 1.04853732 FALSE 140 1_37 1 37 CAMTA1 0.64794094 FALSE 141 1_38 1 38 CAMTA1 1.23026086 TRUE 142 1_38 1 38VAMP3 1.23026086 TRUE 143 1_38 1 38 PER3 1.23026086 TRUE 144 1_39 1 39 PER3 1.18154967 TRUE 145 1_39 1 39 UTS2 1.18154967 TRUE 146 1_39 1 39 TNFRSF9 1.18154967 TRUE 147 1_39 1 39PARK7 1.18154967 TRUE 148 1_39 1 39 ERRFI1 1.18154967 TRUE 149 1_40 1 40 no_gene 1.79796879 FALSE 150 1_41 1 41 SLC45A1 0.20193560 FALSE I want to create two new columns, xp.bg and xp.n.top, using the following criteria: If gene is the same in consecutive rows, xp.bg is the minimum value of xp.norm in those rows; if gene is not the same, xp.bg is simply the value of xp.norm for that row; Likewise, if there's a run of contiguous xp.top = TRUE values, xp.n.top is the minimum value in that range, and if xp.top is false or NA, xp.n.top is NA, or 0 (I don't care). So, in the above example, xp.bg for rows 136:141 should be 0.64794094, and is equal to xp.norm for all other rows, xp.n.top for row 137 is 1.40312732, 1.18154967 for rows 141:148, and 0/NA for all other rows. Is there a way to combine indexing and if statements or some such to accomplish this? I want to it this without using split(main.table, main.table$gene), because there's about 20,000 unique entries for gene, and one of the entries, no_gene, is repeated throughout. I thought briefly of subsetting the rows where xp.top is TRUE, but I then don't know how to set the range for min, so that it only looks at what would originally have been consecutive rows, and searching the help has not proved particularly useful. Thanks in advance, Irene Gallego Romero -- Irene Gallego Romero Leverhulme Centre for Human Evolutionary Studies University of Cambridge Fitzwilliam St Cambridge CB1 3QH UK email: ig...@cam.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using tcltk or other graphical widgets to view zoo time series objects
Dear all, I am looking at the R-help entry below: http://finzi.psych.upenn.edu/R/Rhelp02/archive/26640.html I have a more complicatedt problem. I have a zoo time series frame with 100+ sequences. I want to cycle through them back and forth and compare them to the 1st column at any time. I need also a button to click when I need the viewed-selected sequence (that is being compared to the 1st column one) to be manipulated (by some algorithm or be saved individually etc. etc.)... I am trying to modify the code at the above link but somehow I can not make it to work with zoo time series objects. Any help would be greatly appreciated. Thanks in advance, Costas __ Information from ESET Smart Security, version of virus signature database 4813 (20100128) __ The message was checked by ESET Smart Security. http://www.eset.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using tcltk or other graphical widgets to view zoo time series objects
There is an example of using zoo together with the playwith package at the end of the examples section of help(xyplot.zoo) which may address this. On Thu, Jan 28, 2010 at 7:10 AM, Research risk2...@ath.forthnet.gr wrote: Dear all, I am looking at the R-help entry below: http://finzi.psych.upenn.edu/R/Rhelp02/archive/26640.html I have a more complicatedt problem. I have a zoo time series frame with 100+ sequences. I want to cycle through them back and forth and compare them to the 1st column at any time. I need also a button to click when I need the viewed-selected sequence (that is being compared to the 1st column one) to be manipulated (by some algorithm or be saved individually etc. etc.)... I am trying to modify the code at the above link but somehow I can not make it to work with zoo time series objects. Any help would be greatly appreciated. Thanks in advance, Costas __ Information from ESET Smart Security, version of virus signature database 4813 (20100128) __ The message was checked by ESET Smart Security. http://www.eset.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RMySQL - Bulk loading data and creating FK links
Its only important internally. Externally its undesirable that the user have to get involved in it. The idea of making software easy to write and use is to hide the implementation and focus on the problem. That is why we use high level languages, object orientation, etc. On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How it represents data internally is very important, depending on the real goal : http://en.wikipedia.org/wiki/Column-oriented_DBMS Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com... How it represents data internally should not be important as long as you can do what you want. SQL is declarative so you just specify what you want rather than how to get it and invisibly to the user it automatically draws up a query plan and then uses that plan to get the result. On Wed, Jan 27, 2010 at 12:48 PM, Matthew Dowle mdo...@mdowle.plus.com wrote: sqldf(select * from BOD order by Time desc limit 3) Exactly. SQL requires use of order by. It knows the order, but it isn't ordered. Thats not good, but might be fine, depending on what the real goal is. Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001270629w4795da89vb7d77af6e4e8b...@mail.gmail.com... On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How many columns, and of what type are the columns ? As Olga asked too, it would be useful to know more about what you're really trying to do. 3.5m rows is not actually that many rows, even for 32bit R. Its depends on the columns and what you want to do with those columns. At the risk of suggesting something before we know the full facts, one possibility is to load the data from flat file into data.table. Use setkey() to set your keys. Use tables() to summarise your various tables. Then do your joins etc all-in-R. data.table has fast ways to do those sorts of joins (but we need more info about your task). Alternatively, you could check out the sqldf website. There is an sqlread.csv (or similar name) which can read your files directly into SQL read.csv.sql instead of going via R. Gabor has some nice examples there about that and its faster. You use some buzzwords which makes me think that SQL may not be appropriate for your task though. Can't say for sure (because we don't have enough information) but its possible you are struggling because SQL has no row ordering concept built in. That might be why you've created an increment In the SQLite database it automatically assigns a self incrementing hidden column called rowid to each row. e.g. using SQLite via the sqldf package on CRAN and the BOD data frame which is built into R we can display the rowid column explicitly by referring to it in our select statement: library(sqldf) BOD Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 5 5 15.6 6 7 19.8 sqldf(select rowid, * from BOD) rowid Time demand 1 1 1 8.3 2 2 2 10.3 3 3 3 19.0 4 4 4 16.0 5 5 5 15.6 6 6 7 19.8 field? Do your queries include order by incrementing field? SQL is not good at first and last type logic. An all-in-R solution may well be In SQLite you can get the top 3 values, say, like this (continuing the prior example): sqldf(select * from BOD order by Time desc limit 3) Time demand 1 7 19.8 2 5 15.6 3 4 16.0 better, since R is very good with ordered vectors. A 1GB data.table (or data.frame) for example, at 3.5m rows, could have 76 integer columns, or 38 double columns. 1GB is well within 32bit and allows some space for working copies, depending on what you want to do with the data. If you have 38 or less columns, or you have 64bit, then an all-in-R solution *might* get your task done quicker, depending on what your real goal is. If this sounds plausible, you could post more details and, if its appropriate, and luck is on your side, someone might even sketch out how to do an all-in-R solution. Nathan S. Watson-Haigh nathan.watson-ha...@csiro.au wrote in message news:4b5fde1b.10...@csiro.au... I have a table (contact) with several fields and it's PK is an auto increment field. I'm bulk loading data to this table from files which if successful will be about 3.5million rows (approx 16000 rows per file). However, I have a linking table (an_contact) to resolve a m:m relationship between the an and contact tables. How can I retrieve the PK's for the data bulk loaded into contact so I can insert the relevant data into an_contact. I currently load the data into contact using: dbWriteTable(con, contact, dat, append=TRUE, row.names=FALSE) But I then need to get all the PK's which this dbWriteTable() appended to the contact table so I can load the data into my an_contact link table. I don't want to issue a separate INSERT query for each row in dat and then use MySQLs LAST_INSERT_ID() functionnot when I have 3.5million rows to
Re: [R] Using tcltk or other graphical widgets to view zoo time series objects
The playwith package might help, though if I understand the problem correctly, the help(xyplot.zoo) example is not so relevant. If you want to switch between many series you could use a spin-button or somesuch. To execute a function you can create a button. If you have a hundred-column dataset like dat - zoo(matrix(rnorm(100*100),ncol=100), Sys.Date()+1:100) colnames(dat) - paste(Series, 1:100) Then this will give you a spin button to choose the column to plot, and a button to print out the current series number. playwith(xyplot(dat[,c(1,i)]), parameters = list(i = 1:100, do_something = function(playState) print(playState$env$i)) ) Note that the playwith package uses RGtk2, and therefore requires the GTK+ libraries to be installed on your system. On 28 January 2010 23:16, Gabor Grothendieck ggrothendi...@gmail.com wrote: There is an example of using zoo together with the playwith package at the end of the examples section of help(xyplot.zoo) which may address this. On Thu, Jan 28, 2010 at 7:10 AM, Research risk2...@ath.forthnet.gr wrote: Dear all, I am looking at the R-help entry below: http://finzi.psych.upenn.edu/R/Rhelp02/archive/26640.html I have a more complicatedt problem. I have a zoo time series frame with 100+ sequences. I want to cycle through them back and forth and compare them to the 1st column at any time. I need also a button to click when I need the viewed-selected sequence (that is being compared to the 1st column one) to be manipulated (by some algorithm or be saved individually etc. etc.)... I am trying to modify the code at the above link but somehow I can not make it to work with zoo time series objects. Any help would be greatly appreciated. Thanks in advance, Costas __ Information from ESET Smart Security, version of virus signature database 4813 (20100128) __ The message was checked by ESET Smart Security. http://www.eset.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Felix Andrews / 安福立 Postdoctoral Fellow Integrated Catchment Assessment and Management (iCAM) Centre Fenner School of Environment and Society [Bldg 48a] The Australian National University Canberra ACT 0200 Australia M: +61 410 400 963 T: + 61 2 6125 4670 E: felix.andr...@anu.edu.au CRICOS Provider No. 00120C -- http://www.neurofractal.org/felix/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using function boot
Dear R Users, I am trying to use the function boot of the boot package to sample from a dataframe of two character variables (N=1127). Each character variable can take five different values. Here is an example of the data: 1 b95-99.9 d25% 2 b95-99.9 a1% 3 b95-99.9 a1% 4 b95-99.9 a1% 5 b95-99.9 a1% 6a99.9 a1% 7 b95-99.9 a1% 8 b95-99.9 a1% 9 b95-99.9 a1% 10 b95-99.9 a1% The statistic I want to use is the median polish (I created my own function that calls function medpolish from stats package). In my function, I included a second argument for the weight as asked by the boot function. Here is my function, which basically creates the table from the two variables, divides each cell by the sum of the column to obtain percentage, does the median polish, and computes the median of some of the cells: juste.polish-function(data,w=rep(1,nrow(data))/nrow(data)) {tableR-table(data[,1],data[,2]) tableP-tableR marg2-apply(tableR,2,sum) for (i in 1:nrow(tableP)) {tableP[i,]-100*(tableR[i,]/marg2)} juste.medp-medpolish(tableP) median(c(juste.medp$residuals[dimnames(juste.medp$residuals)[[1]]==e60,1], juste.medp$residuals[dimnames(juste.medp$residuals)[[1]]==d60-79,2], juste.medp$residuals[dimnames(juste.medp$residuals)[[1]]==c80-94,3], juste.medp$residuals[dimnames(juste.medp$residuals)[[1]]==b95-99.9,4], juste.medp$residuals[dimnames(juste.medp$residuals)[[1]]==a99.9,5])) } When I call the boot function (juste.boot-boot(data=mydata,statistic=juste.polish,R=999)), it works but computes the same parameter at every boot sample, as if the resampling did not work and always provided a sample identical to the original sample. If you have any ideas, I would be very grateful. thanks, delphine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add points to 3D plot using p3d {onion}
On 27.01.2010 17:50, Viechtbauer Wolfgang (STAT) wrote: Just as an aside, the scatterplot3d package does things like this very cleverly. Essentially, when you create a plot with scatterplot3d, the function actually returns functions with values set so that points3d(), for example, knows the axis scaling. Right, it makes use of lexical scoping properties, where the environment is attached to the returned graphics functions. Uwe Ligges Best, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional editing of rows in a data frame
On Jan 28, 2010, at 7:05 AM, Irene Gallego Romero wrote: Dear R users, I have a dataframe (main.table) with ~30,000 rows and 6 columns, of which here are a few rows: id chr window gene xp.normxp.top 129 1_32 1 32 TAS1R1 1.28882115 FALSE 130 1_32 1 32 ZBTB48 1.28882115 FALSE 131 1_32 1 32 KLHL21 1.28882115 FALSE 132 1_32 1 32PHF13 1.28882115 FALSE 133 1_33 1 33PHF13 1.02727430 FALSE 134 1_33 1 33THAP3 1.02727430 FALSE 135 1_33 1 33 DNAJC11 1.02727430 FALSE 136 1_33 1 33 CAMTA1 1.02727430 FALSE 137 1_34 1 34 CAMTA1 1.40312732 TRUE 138 1_35 1 35 CAMTA1 1.52104538 FALSE 139 1_36 1 36 CAMTA1 1.04853732 FALSE 140 1_37 1 37 CAMTA1 0.64794094 FALSE 141 1_38 1 38 CAMTA1 1.23026086 TRUE 142 1_38 1 38VAMP3 1.23026086 TRUE 143 1_38 1 38 PER3 1.23026086 TRUE 144 1_39 1 39 PER3 1.18154967 TRUE 145 1_39 1 39 UTS2 1.18154967 TRUE 146 1_39 1 39 TNFRSF9 1.18154967 TRUE 147 1_39 1 39PARK7 1.18154967 TRUE 148 1_39 1 39 ERRFI1 1.18154967 TRUE 149 1_40 1 40 no_gene 1.79796879 FALSE 150 1_41 1 41 SLC45A1 0.20193560 FALSE I want to create two new columns, xp.bg and xp.n.top, using the following criteria: If gene is the same in consecutive rows, xp.bg is the minimum value of xp.norm in those rows; if gene is not the same, xp.bg is simply the value of xp.norm for that row; Assuming that gene values are adjacent in a dataframe named df1, then this would work: df1$xp.bg- with(df1, ave(xp.norm, gene, FUN=min)) Likewise, if there's a run of contiguous xp.top = TRUE values, xp.n.top is the minimum value in that range, and if xp.top is false or NA, xp.n.top is NA, or 0 (I don't care). df1$seqgrp - c(0, diff(df1$xp.top)) df1$seqgrp2 - cumsum(df1$seqgrp != 0) df1$xp.n.top - with(df1, ave(xp.norm, seqgrp2, FUN=min)) is.na(df1$xp.n.top) - !xp.top df1$xp.bg- with(df1, ave(xp.norm, gene, FUN=min)) df1 id chr windowgene xp.norm xp.top seqgrp seqgrp2 xp.n.top xp.bg 129 1_32 1 32 TAS1R1 1.2888211 FALSE 0 0 NA 1.2888211 130 1_32 1 32 ZBTB48 1.2888211 FALSE 0 0 NA 1.2888211 131 1_32 1 32 KLHL21 1.2888211 FALSE 0 0 NA 1.2888211 132 1_32 1 32 PHF13 1.2888211 FALSE 0 0 NA 1.0272743 133 1_33 1 33 PHF13 1.0272743 FALSE 0 0 NA 1.0272743 134 1_33 1 33 THAP3 1.0272743 FALSE 0 0 NA 1.0272743 135 1_33 1 33 DNAJC11 1.0272743 FALSE 0 0 NA 1.0272743 136 1_33 1 33 CAMTA1 1.0272743 FALSE 0 0 NA 0.6479409 137 1_34 1 34 CAMTA1 1.4031273 TRUE 1 1 1.403127 0.6479409 138 1_35 1 35 CAMTA1 1.5210454 FALSE -1 2 NA 0.6479409 139 1_36 1 36 CAMTA1 1.0485373 FALSE 0 2 NA 0.6479409 140 1_37 1 37 CAMTA1 0.6479409 FALSE 0 2 NA 0.6479409 141 1_38 1 38 CAMTA1 1.2302609 TRUE 1 3 1.181550 0.6479409 142 1_38 1 38 VAMP3 1.2302609 TRUE 0 3 1.181550 1.2302609 143 1_38 1 38PER3 1.2302609 TRUE 0 3 1.181550 1.1815497 144 1_39 1 39PER3 1.1815497 TRUE 0 3 1.181550 1.1815497 145 1_39 1 39UTS2 1.1815497 TRUE 0 3 1.181550 1.1815497 146 1_39 1 39 TNFRSF9 1.1815497 TRUE 0 3 1.181550 1.1815497 147 1_39 1 39 PARK7 1.1815497 TRUE 0 3 1.181550 1.1815497 148 1_39 1 39 ERRFI1 1.1815497 TRUE 0 3 1.181550 1.1815497 149 1_40 1 40 no_gene 1.7979688 FALSE -1 4 NA 1.7979688 150 1_41 1 41 SLC45A1 0.2019356 FALSE 0 4 NA 0.2019356 And if the adjacent-gene assumption of the first request above were not met, then the first portion of this method could be used instead to great group indices. -- David. So, in the above example, xp.bg for rows 136:141 should be 0.64794094, and is equal to xp.norm for all other rows, xp.n.top for row 137 is 1.40312732, 1.18154967 for rows 141:148, and 0/NA for all other rows. Is there a way to combine indexing and if statements or some such to accomplish this? I want to it this without using split(main.table, main.table$gene), because there's about 20,000 unique entries for gene, and one of the entries, no_gene, is repeated throughout. I thought briefly of subsetting the rows where xp.top is TRUE, but I then don't know how to set the range for min, so that it only looks at what would originally have been consecutive rows, and searching the help has not proved particularly
[R] AFT-model with time-varying covariates and left-truncation
Dear Prof. Broström, Dear R-mailinglist, first of all thanks a lot for your great effort to incorporate time-varying covariates into aftreg. It works like a charm so far and I'll update you with detailled benchmarks as soon as I have them. I have one more questions regarding Accelerated Failure Time models (with aftreg): You mention that left truncation in combination with time-varying covariates only works if ...it can be assumed that the covariate values during the first non-observable interval are the same as at the beginning of the first interval under observation.. My question is: Is there a way to use an AFT model where one has no explicit assumption about what values the covariates have before the subject enters the study (see example below if unclear)? For me personally it would already be a great help to know if this is statistically feasible in general, however I'm also interested if it can me modelled with aftreg. EXAMPLE (to make sure we're talking about the same thing): Suppose I want to model the lifetime of two wearparts A and B with temperature as a covariate. For some reason, I can only observe the temperature at three distinct times t1, t2, t3 where they each have a certain age (5 hours, 6 hours, 7 hours respectively). Of course, I have a different temperature for each part at each observation t1, t2, t3. Unfortunately at t1 both parts have not been used for the first time and already have a certain age (5 hours) and I cannot observe what the temperature was before (at ages 1hr, 2hr, ...). Thanks a lot for your help! All the best Philipp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional editing of rows in a data frame
If DF is your data frame then: DF$xp.bg - ave(DF$xp.norm, DF$gene, FUN = min) will create a new column such that the entry in each row has the minimum xp.norm of all rows with the same gene. ave does use split internally but I think it would be worth trying anyways since its only one short line of code. See help(ave) On Thu, Jan 28, 2010 at 7:05 AM, Irene Gallego Romero ig...@cam.ac.uk wrote: Dear R users, I have a dataframe (main.table) with ~30,000 rows and 6 columns, of which here are a few rows: id chr window gene xp.norm xp.top 129 1_32 1 32 TAS1R1 1.28882115 FALSE 130 1_32 1 32 ZBTB48 1.28882115 FALSE 131 1_32 1 32 KLHL21 1.28882115 FALSE 132 1_32 1 32 PHF13 1.28882115 FALSE 133 1_33 1 33 PHF13 1.02727430 FALSE 134 1_33 1 33 THAP3 1.02727430 FALSE 135 1_33 1 33 DNAJC11 1.02727430 FALSE 136 1_33 1 33 CAMTA1 1.02727430 FALSE 137 1_34 1 34 CAMTA1 1.40312732 TRUE 138 1_35 1 35 CAMTA1 1.52104538 FALSE 139 1_36 1 36 CAMTA1 1.04853732 FALSE 140 1_37 1 37 CAMTA1 0.64794094 FALSE 141 1_38 1 38 CAMTA1 1.23026086 TRUE 142 1_38 1 38 VAMP3 1.23026086 TRUE 143 1_38 1 38 PER3 1.23026086 TRUE 144 1_39 1 39 PER3 1.18154967 TRUE 145 1_39 1 39 UTS2 1.18154967 TRUE 146 1_39 1 39 TNFRSF9 1.18154967 TRUE 147 1_39 1 39 PARK7 1.18154967 TRUE 148 1_39 1 39 ERRFI1 1.18154967 TRUE 149 1_40 1 40 no_gene 1.79796879 FALSE 150 1_41 1 41 SLC45A1 0.20193560 FALSE I want to create two new columns, xp.bg and xp.n.top, using the following criteria: If gene is the same in consecutive rows, xp.bg is the minimum value of xp.norm in those rows; if gene is not the same, xp.bg is simply the value of xp.norm for that row; Likewise, if there's a run of contiguous xp.top = TRUE values, xp.n.top is the minimum value in that range, and if xp.top is false or NA, xp.n.top is NA, or 0 (I don't care). So, in the above example, xp.bg for rows 136:141 should be 0.64794094, and is equal to xp.norm for all other rows, xp.n.top for row 137 is 1.40312732, 1.18154967 for rows 141:148, and 0/NA for all other rows. Is there a way to combine indexing and if statements or some such to accomplish this? I want to it this without using split(main.table, main.table$gene), because there's about 20,000 unique entries for gene, and one of the entries, no_gene, is repeated throughout. I thought briefly of subsetting the rows where xp.top is TRUE, but I then don't know how to set the range for min, so that it only looks at what would originally have been consecutive rows, and searching the help has not proved particularly useful. Thanks in advance, Irene Gallego Romero -- Irene Gallego Romero Leverhulme Centre for Human Evolutionary Studies University of Cambridge Fitzwilliam St Cambridge CB1 3QH UK email: ig...@cam.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with fitdistr
Do you have any zeros in your data? fitdistr() will need start values (see the code), but even with start values, optim() will have problems. x - rweibull(100, 2, 10) fitdistr(x, weibull) ## no problem fitdistr(c(0,x), weibull) ## your error message fitdistr(c(0,x), weibull,start=list(shape=2,scale=10)) ## still an error message from optim() -Peter Ehlers vikrant wrote: Hi, I want to estimate parameters of weibull distribution. For this, I am using fitdistr() function in MASS package.But when I give fitdistr(c,weibull) I get a Error as follows:- Error in optim(x = c(4L, 41L, 20L, 6L, 12L, 6L, 7L, 13L, 2L, 8L, 22L, : non-finite value supplied by optim Any help or suggestions are most welcomed -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained vector permutation
Andrew Rominger ajrominger at gmail.com writes: I'm trying to permute a vector of positive integers 0 with the constraint Hi Andy I'm not sure if you are explicitly wanting to use a sampling approach, but the gtools library has a permutations function (found by ??permutation then ? gtools::combinations). Hope this helps, Jason Smith Here is the script I used: # Constraint # f(n_i) = 2 * f(n_(i-1)) # # Given a start value and the number of elements # recursively generate a vector representing the # maximum values each index is allowed # f - function(value, num_elements) { #cat(paste(f(,value,,,num_elements,)\n)) if (num_elements 1) { value; } else { z - c(value,f(2*value, num_elements-1)) } } # Generate base vector v - 2:6 # Calculate constraint vector v.constraints - f(v[1],length(v)-1) # Generate permutations using gtools functions library(gtools) v.permutations - permutations(length(v), length(v), v) # Check each permutation results - apply(v.permutations,1, function(x) all(x = v.constraints)) # # Display Results # print(Original Vector) print(v) print(Constraint Vector) print(v.constraints) print(Does Vector meet Constraints) print(cbind(v.permutations,results)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RMySQL - Bulk loading data and creating FK links
Are you claiming that SQL is that utopia? SQL is a row store. It cannot give the user the benefits of column store. For example, why does SQL take 113 seconds in the example in this thread : http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html but data.table takes 5 seconds to get the same result ? How come the high level language SQL doesn't appear to hide the user from this detail ? If you are just describing utopia, then of course I agree. It would be great to have a language which hid us from this. In the meantime the user has choices, and the best choice depends on the task and the real goal. Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001280428p345f8ff4v5f3a80c13f96d...@mail.gmail.com... Its only important internally. Externally its undesirable that the user have to get involved in it. The idea of making software easy to write and use is to hide the implementation and focus on the problem. That is why we use high level languages, object orientation, etc. On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How it represents data internally is very important, depending on the real goal : http://en.wikipedia.org/wiki/Column-oriented_DBMS Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com... How it represents data internally should not be important as long as you can do what you want. SQL is declarative so you just specify what you want rather than how to get it and invisibly to the user it automatically draws up a query plan and then uses that plan to get the result. On Wed, Jan 27, 2010 at 12:48 PM, Matthew Dowle mdo...@mdowle.plus.com wrote: sqldf(select * from BOD order by Time desc limit 3) Exactly. SQL requires use of order by. It knows the order, but it isn't ordered. Thats not good, but might be fine, depending on what the real goal is. Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001270629w4795da89vb7d77af6e4e8b...@mail.gmail.com... On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How many columns, and of what type are the columns ? As Olga asked too, it would be useful to know more about what you're really trying to do. 3.5m rows is not actually that many rows, even for 32bit R. Its depends on the columns and what you want to do with those columns. At the risk of suggesting something before we know the full facts, one possibility is to load the data from flat file into data.table. Use setkey() to set your keys. Use tables() to summarise your various tables. Then do your joins etc all-in-R. data.table has fast ways to do those sorts of joins (but we need more info about your task). Alternatively, you could check out the sqldf website. There is an sqlread.csv (or similar name) which can read your files directly into SQL read.csv.sql instead of going via R. Gabor has some nice examples there about that and its faster. You use some buzzwords which makes me think that SQL may not be appropriate for your task though. Can't say for sure (because we don't have enough information) but its possible you are struggling because SQL has no row ordering concept built in. That might be why you've created an increment In the SQLite database it automatically assigns a self incrementing hidden column called rowid to each row. e.g. using SQLite via the sqldf package on CRAN and the BOD data frame which is built into R we can display the rowid column explicitly by referring to it in our select statement: library(sqldf) BOD Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 5 5 15.6 6 7 19.8 sqldf(select rowid, * from BOD) rowid Time demand 1 1 1 8.3 2 2 2 10.3 3 3 3 19.0 4 4 4 16.0 5 5 5 15.6 6 6 7 19.8 field? Do your queries include order by incrementing field? SQL is not good at first and last type logic. An all-in-R solution may well be In SQLite you can get the top 3 values, say, like this (continuing the prior example): sqldf(select * from BOD order by Time desc limit 3) Time demand 1 7 19.8 2 5 15.6 3 4 16.0 better, since R is very good with ordered vectors. A 1GB data.table (or data.frame) for example, at 3.5m rows, could have 76 integer columns, or 38 double columns. 1GB is well within 32bit and allows some space for working copies, depending on what you want to do with the data. If you have 38 or less columns, or you have 64bit, then an all-in-R solution *might* get your task done quicker, depending on what your real goal is. If this sounds plausible, you could post more details and, if its appropriate, and luck is on your side, someone might even sketch out how to do an all-in-R solution. Nathan S. Watson-Haigh nathan.watson-ha...@csiro.au wrote in message news:4b5fde1b.10...@csiro.au... I have a table (contact) with several fields and it's PK is an auto increment field. I'm bulk loading data
Re: [R] number of decimal
?formatC ?sprintf Ivan Calandra wrote: Hi everybody, I'm trying to set the number of decimals (i.e. the number of digits after the .). I looked into options but I can only set the total number of digits, with options(digits=6). But since I have different variables with different order of magnitude, I would like that they're all displayed with the same number of decimals. I searched for it and found the format() function, with nsmall=6, but it is for a given vector. I would like to set it for the whole session, as with options. Can anyone help me? Thanks in advance Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained vector permutation
I just realized I read through your email too quickly and my script does not actually address the constraint on each permutation, sorry about that. You should be able to use the permutations function to generate the vector permutations however. Jason __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Print lattice output to table?
I have beautiful box and whisker charts formatted with lattice, which is obviously calculating summary statistics internally in order to draw the charts. Is there a way to dump the associated summary tables that are being used to generate the charts? Realize I could use tapply or such to get something similar, but I have all the groupings and such already configured to generate the charts. Simply want to dump those values to a table so that I don't have to interpolate where the 75th percentile is on a visual chart. Appreciate any thoughts.. -- View this message in context: http://n4.nabble.com/Print-lattice-output-to-table-tp1375040p1375040.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setting breaks for histogram of dates
Hi, I have a list of dates like this: date 2009-12-03 2009-12-11 2009-10-07 2010-01-25 2010-01-05 2009-09-09 2010-01-19 2010-01-25 2009-02-05 2010-01-25 2010-01-27 2010-01-27 ... and am creating a histogram like this t - read.table(test.dat,header=TRUE) hist(as.Date(t$date), years, format = %d/%m/%y, freq=TRUE) However, I would rather not label the breaks themselves, but instead print the date with the format %Y, between the breaks. Is there a simple way of doing this? Regards Loris -- Dr. Loris Bennett ZEDAT Computer Centre Freie Universität Berlin Berlin, Germany __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] select one row from data-frame by name, indirectly (as string)
Hello, say I have a dataframe x and it contains rows like ch_01, ch_02 and so on. How can I select those channels iundirectly, by name? I tried to select the data with get() but get() seems only to work on simple variables? Or how to do it? I need something like that: name1 - ch_01 name2 - ch_02 selected - function( x, name1) Any ideas? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select one row from data-frame by name, indirectly (as string)
OK, now it works... just using [ and ] or [[ and ]] works. I thought have tried it before... why does it workj now and not before? hmhh sorry for the traffic __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select one row from data-frame by name, indirectly (as string)
On Jan 28, 2010, at 10:04 AM, Oliver wrote: OK, now it works... just using [ and ] or [[ and ]] works. I thought have tried it before... why does it workj now and not before? Provide your console session and someone can tell you. Failing that, you are asking us to read your mind. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal
It looks to me that it does more or less the same as format(). Maybe I didn't explain myself correctly then. I would like to set the number of decimal by default, for the whole R session, like I do with options(digits=6). Except that digits sets up the number of digits (including what is before the .). I'm looking for some option that will let me set the number of digits AFTER the . Example: I have 102.33556677 and 2.999555666 If I set the number of decimal to 6, I should get: 102.335567 and 2.999556. And that for all numbers that will be in/output from R (read.table, write.table, statistic tests, etc) Or is it that I didn't understand everything about formatC() and sprintf()? Thanks again Ivan Le 1/28/2010 15:12, Peter Ehlers a écrit : ?formatC ?sprintf Ivan Calandra wrote: Hi everybody, I'm trying to set the number of decimals (i.e. the number of digits after the .). I looked into options but I can only set the total number of digits, with options(digits=6). But since I have different variables with different order of magnitude, I would like that they're all displayed with the same number of decimals. I searched for it and found the format() function, with nsmall=6, but it is for a given vector. I would like to set it for the whole session, as with options. Can anyone help me? Thanks in advance Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print lattice output to table?
On Thu, Jan 28, 2010 at 6:25 AM, GL pfl...@shands.ufl.edu wrote: I have beautiful box and whisker charts formatted with lattice, which is obviously calculating summary statistics internally in order to draw the charts. Is there a way to dump the associated summary tables that are being used to generate the charts? Realize I could use tapply or such to get something similar, but I have all the groupings and such already configured to generate the charts. Simply want to dump those values to a table so that I don't have to interpolate where the 75th percentile is on a visual chart. Appreciate any thoughts.. You can customize the function that computes the summary statistics, and that seems like the only reasonable entry-point for you. A simple example: bwplot(voice.part ~ height, data = singer, stats = function(...) { ans - boxplot.stats(...) str(ans) ans }) You will need to figure out how you will dump the parts you want. -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] make a grid with longitude, latitude and bathymetry data
hi, i have a longitude vector (x) a latitude vector (y) and a matrix of bathymetry (z) with the dimensions (x,y). I have already succeeded in plotting it with the image.plot (package 'field') and the contour functions. But now, I want to make a grid in order to extract easily the bathymetry corresponding to a couple of longitude, latitude coordinates. Do you know a function or a package which can help me? Or do you know how to do it? (because i have already looked for it on the internet and i didn't find anything) Thanks a lot. Karine _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print lattice output to table?
That works great. Thanks! -- View this message in context: http://n4.nabble.com/Print-lattice-output-to-table-tp1375040p1380862.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional density plot in lattice
On Fri, Jan 22, 2010 at 2:08 AM, Dieter Menne dieter.me...@menne-biomed.de wrote: Deepayan Sarkar wrote: With a restructuring of the data: df1 = data.frame(x=0:n, y1=((0:n)/n)^2, y2=1-((0:n)/n)^2, age=young) df2 = data.frame(x=0:n, y1=((0:n)/n)^3, y2=1-((0:n)/n)^3, age=old) df = rbind(df1, df2) xyplot((y1+y2) + y1 ~ x | age, data=df, type = l) xyplot((y1+y2) + y1 ~ x | age, data=df, type = l, scales = list(axs = i), panel = panel.superpose, panel.groups = function(x, y, fill, ...) { panel.polygon(c(min(x), x, max(x)), c(0, y, 0), fill = fill) }) Thanks, Deepayan. I noted, that the color of the bands is determined by superpose.symbol. Is that by design or typo? By design, in the sense that the default 'fill' for panel.superpose is taken from superpose.symbol$fill, which makes sense because the default is to plot symbols. You could supply a top-level vector 'fill' instead (which could be trellis.par.get(superpose.poygon)$fill). -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] color palette for points, lines, text / interactive Rcolorpicker?
I'm looking for a scheme to generate a default color palette for plotting points, lines and text (on a white or transparent background) with from 2 to say 9 colors with the following constraints: - red is reserved for another purpose - colors should be highly distinct - avoid light colors (like yellows) In RColorBrewer, most of the schemes are designed for area fill rather than points and lines. The closest I can find for these needs is the Dark2 palette, e.g., library(RColorBrewer) display.brewer.pal(7,Dark2) I'm wondering if there is something else I can use. On a related note, I wonder if there is something like an interactive color picker for R. For example, http://research.stowers-institute.org/efg/R/Color/Chart/ displays several charts of all R colors. I'd like to find something that displays such a chart and uses identify() to select a set of tiles, whose colors() indices are returned by the function. -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grid.image(), pckg grid
While I am very happy with and awed by the grid package and its basic plotting primitives such as grid.points, grid.lines, etc, I was wondering whether the equivalent of a grid.image() function exists ? Any pointer would be helpful. Thanks ! Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] exporting multidimensional matrix from R
Hi, I have a matrix of size 19x512x20 in R. I want to export this file into another format which can be imported into MATLAB. write.xls or write.table exports only one dimension. please send a code if possible. I am very new to R and have been struggling with this. Thanks ! Gopi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal
Ivan Calandra wrote: It looks to me that it does more or less the same as format(). Maybe I didn't explain myself correctly then. I would like to set the number of decimal by default, for the whole R session, like I do with options(digits=6). Except that digits sets up the number of digits (including what is before the .). I'm looking for some option that will let me set the number of digits AFTER the . Example: I have 102.33556677 and 2.999555666 If I set the number of decimal to 6, I should get: 102.335567 and 2.999556. And that for all numbers that will be in/output from R (read.table, write.table, statistic tests, etc) Or is it that I didn't understand everything about formatC() and sprintf()? You didn't: formatC(x, digits=6, format=f) [1] 102.335567 2.999556 sprintf(%12.6f, x) [1] 102.335567 2.999556 -Peter Ehlers Thanks again Ivan Le 1/28/2010 15:12, Peter Ehlers a écrit : ?formatC ?sprintf Ivan Calandra wrote: Hi everybody, I'm trying to set the number of decimals (i.e. the number of digits after the .). I looked into options but I can only set the total number of digits, with options(digits=6). But since I have different variables with different order of magnitude, I would like that they're all displayed with the same number of decimals. I searched for it and found the format() function, with nsmall=6, but it is for a given vector. I would like to set it for the whole session, as with options. Can anyone help me? Thanks in advance Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting additive ns components
On Wed, 27 Jan 2010, David Winsemius wrote: On Jan 27, 2010, at 9:09 PM, GlenB wrote: I have an additive model of the following form : zmdlfit - lm(z~ns(x,df=6)+ns(y,df=6)) I can get the fitted values and plot them against z easily enough, but I also want to both obtain and plot the two additive components (the estimates of the two additive terms on the RHS) ?termplot. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal
Ivan, The default behavior for print()ing objects to the console in an R session is via the use of the print.* methods. For real numerics, print.default() is used and the format is based upon the number of significant digits, not the number of decimal places. There is also an interaction with par(scipen), which influences when scientific notation is used. See ?print.default for more information on defaults and behavior, taking note of the 'digits' argument, which is influenced by options(digits). Importantly, you need to differentiate between how R stores numeric real values and how it displays or prints them. Internally, R stores real numbers using a double precision data type by default. The internal storage is not truncated by default and is stored to full precision for doubles, within binary representation limits. You can of course modify the values using functions such as round() or truncate(), etc. See ?round for more information. For display, Peter has already pointed you to sprintf() and related functions, which allow you to format output for pretty printing to things like column aligned tables and such. Those do not however, affect the default output to the R console. HTH, Marc Schwartz On Jan 28, 2010, at 9:21 AM, Ivan Calandra wrote: It looks to me that it does more or less the same as format(). Maybe I didn't explain myself correctly then. I would like to set the number of decimal by default, for the whole R session, like I do with options(digits=6). Except that digits sets up the number of digits (including what is before the .). I'm looking for some option that will let me set the number of digits AFTER the . Example: I have 102.33556677 and 2.999555666 If I set the number of decimal to 6, I should get: 102.335567 and 2.999556. And that for all numbers that will be in/output from R (read.table, write.table, statistic tests, etc) Or is it that I didn't understand everything about formatC() and sprintf()? Thanks again Ivan Le 1/28/2010 15:12, Peter Ehlers a écrit : ?formatC ?sprintf Ivan Calandra wrote: Hi everybody, I'm trying to set the number of decimals (i.e. the number of digits after the .). I looked into options but I can only set the total number of digits, with options(digits=6). But since I have different variables with different order of magnitude, I would like that they're all displayed with the same number of decimals. I searched for it and found the format() function, with nsmall=6, but it is for a given vector. I would like to set it for the whole session, as with options. Can anyone help me? Thanks in advance Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] large integers in R
On Thu, 28 Jan 2010, Benilton Carvalho wrote: Hi Duncan, On Tue, Jan 26, 2010 at 9:09 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 26/01/2010 3:25 PM, Blanford, Glenn wrote: Has there been any update on R's handling large integers greater than 10^9 (between 10^9 and 4x10^9) ? as.integer() in R 2.9.2 lists this as a restriction but doesnt list the actual limit or cause, nor if anyone was looking at fixing it. Integers in R are 4 byte signed integers, so the upper limit is 2^31-1. That's not likely to change soon. But in the hypothetical scenario that this was to change soon and we were to have 64bit integer type (say, when under a 64 bit OS), wouldn't this allow us to have objects whose length exceeded the 2^31-1 limit? The other possibility is that an additional longer type capable of holding vector lengths would be included. In addition to the issues that Duncan mentioned, having the integer type be 64-bit means that it wouldn't match the Fortran default INTEGER type or the C int on most platforms, which are 32-bit. Calling C code would become more difficult. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal
First things first: thanks for your help! I see where the confusion is. With formatC and sprintf, I have to store the numbers I want to change into x. I would like a way without applying a function on specific numbers because I can shorten the numbers that way, but it won't give me more decimals for a test for example. What I mean here is that if I have a F-value = 1.225, formatC won't give me the next 3 decimals, it will just add zeros. I need that because for some of my variables, the sample differ only at the 6th decimal (0.05 vs 0.06), and for other ones the order of magnitude is much higher (120.120225 vs 210.665331). So options(digits=6) cannot do the job as I would like. To make myself even clearer, notice that in my example, all numbers have 6 decimals, but a different number of digits. I hope I'm not bothering you with this question, but I believe that the functions you advised me will not do what I need. I really need something that will set up the number of decimals by default, before the numbers are created by any function. Does such an option even exist in R? Or is it that it doesn't make sense to have different numbers of digits? Would it be better to compare 0.05 and 210.665? Therefore options(digits=6) would be enough. Regards, Ivan Le 1/28/2010 16:43, Peter Ehlers a écrit : Ivan Calandra wrote: It looks to me that it does more or less the same as format(). Maybe I didn't explain myself correctly then. I would like to set the number of decimal by default, for the whole R session, like I do with options(digits=6). Except that digits sets up the number of digits (including what is before the .). I'm looking for some option that will let me set the number of digits AFTER the . Example: I have 102.33556677 and 2.999555666 If I set the number of decimal to 6, I should get: 102.335567 and 2.999556. And that for all numbers that will be in/output from R (read.table, write.table, statistic tests, etc) Or is it that I didn't understand everything about formatC() and sprintf()? You didn't: formatC(x, digits=6, format=f) [1] 102.335567 2.999556 sprintf(%12.6f, x) [1] 102.335567 2.999556 -Peter Ehlers Thanks again Ivan Le 1/28/2010 15:12, Peter Ehlers a écrit : ?formatC ?sprintf Ivan Calandra wrote: Hi everybody, I'm trying to set the number of decimals (i.e. the number of digits after the .). I looked into options but I can only set the total number of digits, with options(digits=6). But since I have different variables with different order of magnitude, I would like that they're all displayed with the same number of decimals. I searched for it and found the format() function, with nsmall=6, but it is for a given vector. I would like to set it for the whole session, as with options. Can anyone help me? Thanks in advance Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal
Looks like I didn't read your post carefully enough. If you want some sort of global option to set the display of numbers from any operation performed by R then that's not likely to be possible without capturing all output and formatting it yourself. As the saying goes 'good luck with that'. Note that options(digits=..) won't give you the requested number of digits in all parts of, say, print(t.test(x,y)). -Peter Ehlers Peter Ehlers wrote: Ivan Calandra wrote: It looks to me that it does more or less the same as format(). Maybe I didn't explain myself correctly then. I would like to set the number of decimal by default, for the whole R session, like I do with options(digits=6). Except that digits sets up the number of digits (including what is before the .). I'm looking for some option that will let me set the number of digits AFTER the . Example: I have 102.33556677 and 2.999555666 If I set the number of decimal to 6, I should get: 102.335567 and 2.999556. And that for all numbers that will be in/output from R (read.table, write.table, statistic tests, etc) Or is it that I didn't understand everything about formatC() and sprintf()? You didn't: formatC(x, digits=6, format=f) [1] 102.335567 2.999556 sprintf(%12.6f, x) [1] 102.335567 2.999556 -Peter Ehlers Thanks again Ivan Le 1/28/2010 15:12, Peter Ehlers a écrit : ?formatC ?sprintf Ivan Calandra wrote: Hi everybody, I'm trying to set the number of decimals (i.e. the number of digits after the .). I looked into options but I can only set the total number of digits, with options(digits=6). But since I have different variables with different order of magnitude, I would like that they're all displayed with the same number of decimals. I searched for it and found the format() function, with nsmall=6, but it is for a given vector. I would like to set it for the whole session, as with options. Can anyone help me? Thanks in advance Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal
On Jan 28, 2010, at 10:55 AM, Marc Schwartz wrote: Ivan, The default behavior for print()ing objects to the console in an R session is via the use of the print.* methods. For real numerics, print.default() is used and the format is based upon the number of significant digits, not the number of decimal places. There is also an interaction with par(scipen), which influences when scientific notation is used. See ?print.default for more information on defaults and behavior, taking note of the 'digits' argument, which is influenced by options(digits). Importantly, you need to differentiate between how R stores numeric real values and how it displays or prints them. Internally, R stores real numbers using a double precision data type by default. The internal storage is not truncated by default and is stored to full precision for doubles, within binary representation limits. You can of course modify the values using functions such as round() or truncate(), etc. See ?round for more information. For display, Peter has already pointed you to sprintf() and related functions, which allow you to format output for pretty printing to things like column aligned tables and such. Those do not however, affect the default output to the R console. If one alters print.default, one can get different behavior, for instance: print.default - function (x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, right = FALSE, max = NULL, useSource = TRUE, ...) {if (is.numeric(x)) {x - as.numeric(sprintf(%7.3f, x))} noOpt - missing(digits) missing(quote) missing(na.print) missing(print.gap) missing(right) missing(max) missing(useSource) length(list(...)) == 0L .Internal(print.default(x, digits, quote, na.print, print.gap, right, max, useSource, noOpt)) } This will have the requested effect for numeric vectors, but does not seem to be altering the behavior of print.data.frame(). print(ac2) score pt times trt 1 28.825139 1 0 1 2 97.458521 1 3 1 3 26.217289 1 6 1 4 80.636507 2 0 1 5 99.729364 2 3 1 6 85.812312 2 6 1 7 2.515870 3 0 1 8 3.893545 3 3 1 9 55.666848 3 6 1 10 21.966027 4 0 1 print(ac2$score) [1] 28.825 97.459 26.217 80.637 99.729 85.812 2.516 3.894 55.667 21.966 HTH, Marc Schwartz On Jan 28, 2010, at 9:21 AM, Ivan Calandra wrote: It looks to me that it does more or less the same as format(). Maybe I didn't explain myself correctly then. I would like to set the number of decimal by default, for the whole R session, like I do with options(digits=6). Except that digits sets up the number of digits (including what is before the .). I'm looking for some option that will let me set the number of digits AFTER the . Example: I have 102.33556677 and 2.999555666 If I set the number of decimal to 6, I should get: 102.335567 and 2.999556. And that for all numbers that will be in/output from R (read.table, write.table, statistic tests, etc) Or is it that I didn't understand everything about formatC() and sprintf()? Thanks again Ivan Le 1/28/2010 15:12, Peter Ehlers a écrit : ?formatC ?sprintf Ivan Calandra wrote: Hi everybody, I'm trying to set the number of decimals (i.e. the number of digits after the .). I looked into options but I can only set the total number of digits, with options(digits=6). But since I have different variables with different order of magnitude, I would like that they're all displayed with the same number of decimals. I searched for it and found the format() function, with nsmall=6, but it is for a given vector. I would like to set it for the whole session, as with options. Can anyone help me? Thanks in advance Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal
I guess the easiest solution for me would therefore be to set options(digits) to a high number, and then round down if I need to! Thanks you both for your input! Ivan Le 1/28/2010 17:02, Peter Ehlers a écrit : Looks like I didn't read your post carefully enough. If you want some sort of global option to set the display of numbers from any operation performed by R then that's not likely to be possible without capturing all output and formatting it yourself. As the saying goes 'good luck with that'. Note that options(digits=..) won't give you the requested number of digits in all parts of, say, print(t.test(x,y)). -Peter Ehlers Peter Ehlers wrote: Ivan Calandra wrote: It looks to me that it does more or less the same as format(). Maybe I didn't explain myself correctly then. I would like to set the number of decimal by default, for the whole R session, like I do with options(digits=6). Except that digits sets up the number of digits (including what is before the .). I'm looking for some option that will let me set the number of digits AFTER the . Example: I have 102.33556677 and 2.999555666 If I set the number of decimal to 6, I should get: 102.335567 and 2.999556. And that for all numbers that will be in/output from R (read.table, write.table, statistic tests, etc) Or is it that I didn't understand everything about formatC() and sprintf()? You didn't: formatC(x, digits=6, format=f) [1] 102.335567 2.999556 sprintf(%12.6f, x) [1] 102.335567 2.999556 -Peter Ehlers Thanks again Ivan Le 1/28/2010 15:12, Peter Ehlers a écrit : ?formatC ?sprintf Ivan Calandra wrote: Hi everybody, I'm trying to set the number of decimals (i.e. the number of digits after the .). I looked into options but I can only set the total number of digits, with options(digits=6). But since I have different variables with different order of magnitude, I would like that they're all displayed with the same number of decimals. I searched for it and found the format() function, with nsmall=6, but it is for a given vector. I would like to set it for the whole session, as with options. Can anyone help me? Thanks in advance Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RMySQL install
Hi everyone, I am trying to install the RMySQL package under windows xp. I've got the MySQL installed on the computer (MySQL server 5.1). I went through the steps presented on the webpage http://biostat.mc.vanderbilt.edu/wiki/Main/RMySQL and googled around and still can't find the answer. With the command readRegistry(SOFTWARE\\MySQL AB, hive=HLM, maxdepth=2) I get the following info: - $`MySQL Connector/ODBC 5.1` $`MySQL Connector/ODBC 5.1`$Version [1] 5.1.6 $`MySQL Server 5.1` $`MySQL Server 5.1`$DataLocation [1] C:\\Documents and Settings\\All Users\\Application Data\\MySQL\\MySQL Server 5.1\\ $`MySQL Server 5.1`$FoundExistingDataDir [1] 0 $`MySQL Server 5.1`$Location [1] C:\\Program Files\\MySQL\\MySQL Server 5.1\\ $`MySQL Server 5.1`$Version [1] 5.1.42 $`MySQL Workbench 5.2 OSS` $`MySQL Workbench 5.2 OSS`$Location [1] C:\\Program Files\\MySQL\\MySQL Workbench 5.2 OSS\\ $`MySQL Workbench 5.2 OSS`$Version [1] 5.2.14 - Everything seems to be ok. However, when loading the package, I get (sorry, it's in French): - Le chargement a nécessité le package : DBI Error in if (utils::file_test(-d, MySQLhome)) break : l'argument est de longueur nulle De plus : Messages d'avis : 1: le package 'RMySQL' a été compilé avec la version R 2.10.1 2: le package 'DBI' a été compilé avec la version R 2.10.1 Error : .onLoad a échoué dans 'loadNamespace' pour 'RMySQL' Erreur : le chargement du package / espace de noms a échoué pour 'RMySQL' - Do you need MySQL Server 5.0, or will it work with 5.1 also ? Thanks for any help. Rob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal
Ivan, Now I'm no longer sure of just what you want. Are you concerned about the *internal* handling of numbers by R or just about the *printing* of numbers? As Marc has pointed out, internally R will use the full precision that your input allows. Perhaps you're using the F-value from the output of a procedure like aov() as input to further analysis. If so, don't use the printed value; pull the value out of the object with something like fm - aov(y ~ x) Fval - summary(fm)[[1]][1,4] But maybe this is not at all what you're after. -Peter Ehlers Ivan Calandra wrote: First things first: thanks for your help! I see where the confusion is. With formatC and sprintf, I have to store the numbers I want to change into x. I would like a way without applying a function on specific numbers because I can shorten the numbers that way, but it won't give me more decimals for a test for example. What I mean here is that if I have a F-value = 1.225, formatC won't give me the next 3 decimals, it will just add zeros. I need that because for some of my variables, the sample differ only at the 6th decimal (0.05 vs 0.06), and for other ones the order of magnitude is much higher (120.120225 vs 210.665331). So options(digits=6) cannot do the job as I would like. To make myself even clearer, notice that in my example, all numbers have 6 decimals, but a different number of digits. I hope I'm not bothering you with this question, but I believe that the functions you advised me will not do what I need. I really need something that will set up the number of decimals by default, before the numbers are created by any function. Does such an option even exist in R? Or is it that it doesn't make sense to have different numbers of digits? Would it be better to compare 0.05 and 210.665? Therefore options(digits=6) would be enough. Regards, Ivan Le 1/28/2010 16:43, Peter Ehlers a écrit : Ivan Calandra wrote: It looks to me that it does more or less the same as format(). Maybe I didn't explain myself correctly then. I would like to set the number of decimal by default, for the whole R session, like I do with options(digits=6). Except that digits sets up the number of digits (including what is before the .). I'm looking for some option that will let me set the number of digits AFTER the . Example: I have 102.33556677 and 2.999555666 If I set the number of decimal to 6, I should get: 102.335567 and 2.999556. And that for all numbers that will be in/output from R (read.table, write.table, statistic tests, etc) Or is it that I didn't understand everything about formatC() and sprintf()? You didn't: formatC(x, digits=6, format=f) [1] 102.335567 2.999556 sprintf(%12.6f, x) [1] 102.335567 2.999556 -Peter Ehlers Thanks again Ivan Le 1/28/2010 15:12, Peter Ehlers a écrit : ?formatC ?sprintf Ivan Calandra wrote: Hi everybody, I'm trying to set the number of decimals (i.e. the number of digits after the .). I looked into options but I can only set the total number of digits, with options(digits=6). But since I have different variables with different order of magnitude, I would like that they're all displayed with the same number of decimals. I searched for it and found the format() function, with nsmall=6, but it is for a given vector. I would like to set it for the whole session, as with options. Can anyone help me? Thanks in advance Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recoding Variables in R
VAR 980490 Some people have suggested placing new limits on foreign imports in order to protect American jobs. Others say that such limits would raise consumer prices and hurt American exports. Do you FAVOR or OPPOSE placing new limits on imports, or haven't you thought much about this? 1. Favor 5. Oppose 8. DK 9. NA; RF 0. Haven't thought much about this I am trying to recode the data for the following public opinion question from the ANES. I would like to throw out 8 and 9. Furthermore, I would like to reorder the responses so that: 1. Oppose (originally 5) 2. Haven't though much about this (originally 0) 3. favor (originally 1) I tried the following, which did not work: library(car) data96$V961327 - recode(data96$V961327, c(1)=2; c(2)=3; c(3)=1) I also tried the following, which also did not work: new - as.numeric(data96$V961327) new data96$V961327 - recode(new, c(5)=1; c(0)=2; c(1)=3) Help, Abraham M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data.frame manipulation
Thank you, Dennis and Petr. One more question: when aggregating to one es per id, how would I go about keeping the other variables in the data.frame (e.g., keeping the value for the first row of the other variables, such as mod2) e.g.: # Dennis provided this example (notice how mod2 is removed from the output): with(x, aggregate(list(es = es), by = list(id = id, mod1 = mod1), mean)) id mod1 es 1 31 0.20 2 12 0.30 3 24 0.15 # How can I get this output (taking the first row of the other variable in the data.frame): id es mod1 mod2 1 .30 2wai 2 .15 4other 3 .20 1 itas Thank you, AC On Thu, Jan 28, 2010 at 1:29 AM, Petr PIKAL petr.pi...@precheza.cz wrote: HI r-help-boun...@r-project.org napsal dne 28.01.2010 04:35:29: Hi All, I'm conducting a meta-analysis and have taken a data.frame with multiple rows per study (for each effect size) and performed a weighted average of effect size for each study. This results in a reduced # of rows. I am particularly interested in simply reducing the additional variables in the data.frame to the first row of the corresponding id variable. For example: id-c(1,2,2,3,3,3) es-c(.3,.1,.3,.1,.2,.3) mod1-c(2,4,4,1,1,1) mod2-c(wai,other,calpas,wai,itas,other) data-as.data.frame(cbind(id,es,mod1,mod2)) Do not use cbind. Its output is a matrix and in this case character matrix. Resulting data frame will consist from factors as you can check by str(data) data-data.frame(id=id,es=es,mod1=mod1,mod2=mod2) data id esmod1 mod2 1 1 0.32 wai 2 2 0.14 other 3 2 0.24 calpas 4 3 0.11 itas 5 3 0.21 wai 6 3 0.31 wai # I would like to reduce the entire data.frame like this: E.g. aggregate aggregate(data[, -(3:4)], data[,3:4], mean) mod1 mod2 id es 14 calpas 2 0.3 21 itas 3 0.2 31 other 3 0.3 44 other 2 0.1 51wai 3 0.1 62wai 1 0.3 doBy or tapply or ddply from plyr library or Regards Petr id es mod1 mod2 1 .30 2wai 2 .15 4other 3 .20 1 itas # If possible, I would also like the option of this (collapsing on id and mod2): id es mod1 mod2 1 .30 2wai 2 0.1 4 other 2 0.2 4calpas 3 0.1 1 itas 3 0.251 wai Any help is much appreciated! AC Del Re [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setting base level for contrasts with lme
Hi all, Note: lm(Yield ~ Block + C(Variety, base = 2), Alfalfa) equals i - 2; lm(Yield ~ Block + C(Variety, base = i), Alfalfa) However, lme(Yield ~ C(Variety, base = 2), Alfalfa, random=~1|Block) which is fine, does not equal i - 2; lme(Yield ~ C(Variety, base = i), Alfalfa, random=~1|Block) after which I get the message Error in model.frame.default(formula = ~Yield + Variety + i + Block, data = list( : variable lengths differ (found for 'i') Is everything fine with that? Regards, Marcin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply on multiple groups
On Jan 28, 2010, at 10:26 AM, GL wrote: Can you make tapply break down groups similar to bwplot or such? Example: Data frame has one measure (Days) and two Dimensions (MM and Place). All have the same length. length(dbs.final$Days) [1] 3306 length() [1] 3306 length() [1] 3306 Doing the following makes a nice table for one dimension and one measure: do.call(rbind,tapply(dbs.final$Days,dbs.final$Place, summary)) But, what I really need to do is break it down on two dimensions and one measures - effectively equivalent to the following bwplot call: bwplot( Days ~ MM | Place, ,data=dbs.final) Is there an equivalent to the | operation in tapply? Please reread the help page for tapply. Perhaps?: tapply(dbs.final$Days, list(dbs.final$MM, dbs.final$Place) summary) -- David -- View this message in context: http://n4.nabble.com/tapply-on-multiple-groups-tp1380593p1380593.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recoding Variables in R
Dear Abraham, If I follow correctly what you want to do, the following should do it: f - factor(c(1, 1, 5, 5, 8, 8, 9, 9, 0, 0)) f [1] 1 1 5 5 8 8 9 9 0 0 Levels: 0 1 5 8 9 recode(f, '1'=3; '5'=1; '0'=2; else=NA ) [1] 3311NA NA NA NA 22 Levels: 1 2 3 I think that your problem was that you didn't distinguish correctly between factor levels and their numeric encoding; factor levels should be quoted in recode(). I hope this helps, John John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mathew, Abraham T Sent: January-28-10 10:15 AM To: r-help@r-project.org Subject: [R] Recoding Variables in R VAR 980490 Some people have suggested placing new limits on foreign imports in order to protect American jobs. Others say that such limits would raise consumer prices and hurt American exports. Do you FAVOR or OPPOSE placing new limits on imports, or haven't you thought much about this? 1. Favor 5. Oppose 8. DK 9. NA; RF 0. Haven't thought much about this I am trying to recode the data for the following public opinion question from the ANES. I would like to throw out 8 and 9. Furthermore, I would like to reorder the responses so that: 1. Oppose (originally 5) 2. Haven't though much about this (originally 0) 3. favor (originally 1) I tried the following, which did not work: library(car) data96$V961327 - recode(data96$V961327, c(1)=2; c(2)=3; c(3)=1) I also tried the following, which also did not work: new - as.numeric(data96$V961327) new data96$V961327 - recode(new, c(5)=1; c(0)=2; c(1)=3) Help, Abraham M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RMySQL - Bulk loading data and creating FK links
I think one would only be concerned about such internals if one were primarily interested in performance; otherwise, one would be more interested in ease of specification and part of that ease is having it independent of implementation and separating implementation from specification activities. An example of separation of specification and implementation is that by simply specifying a disk-based database rather than an in-memory database SQL can perform queries that take more space than memory. The query itself need not be modified. I think the viewpoint you are discussing is primarily one of performance whereas the viewpoint I was discussing is primarily ease of use and that accounts for the difference. I believe your performance comparison is comparing a sequence of operations that include building a database, transferring data to it, performing the operation, reading it back in and destroying the database to an internal manipulation. I would expect the internal manipulation, particular one done primarily in C code as is the case with data.table, to be faster although some benchmarks of the database approach found that it compared surprisingly well to straight R code -- some users of sqldf found that for an 8000 row data frame sqldf actually ran faster than aggregate and also faster than tapply. The News section on the sqldf home page provides links to their benchmarks. Thus if R is fast enough then its likely that the database approach is fast enough too since its even faster. On Thu, Jan 28, 2010 at 8:52 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: Are you claiming that SQL is that utopia? SQL is a row store. It cannot give the user the benefits of column store. For example, why does SQL take 113 seconds in the example in this thread : http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html but data.table takes 5 seconds to get the same result ? How come the high level language SQL doesn't appear to hide the user from this detail ? If you are just describing utopia, then of course I agree. It would be great to have a language which hid us from this. In the meantime the user has choices, and the best choice depends on the task and the real goal. Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001280428p345f8ff4v5f3a80c13f96d...@mail.gmail.com... Its only important internally. Externally its undesirable that the user have to get involved in it. The idea of making software easy to write and use is to hide the implementation and focus on the problem. That is why we use high level languages, object orientation, etc. On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How it represents data internally is very important, depending on the real goal : http://en.wikipedia.org/wiki/Column-oriented_DBMS Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com... How it represents data internally should not be important as long as you can do what you want. SQL is declarative so you just specify what you want rather than how to get it and invisibly to the user it automatically draws up a query plan and then uses that plan to get the result. On Wed, Jan 27, 2010 at 12:48 PM, Matthew Dowle mdo...@mdowle.plus.com wrote: sqldf(select * from BOD order by Time desc limit 3) Exactly. SQL requires use of order by. It knows the order, but it isn't ordered. Thats not good, but might be fine, depending on what the real goal is. Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001270629w4795da89vb7d77af6e4e8b...@mail.gmail.com... On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How many columns, and of what type are the columns ? As Olga asked too, it would be useful to know more about what you're really trying to do. 3.5m rows is not actually that many rows, even for 32bit R. Its depends on the columns and what you want to do with those columns. At the risk of suggesting something before we know the full facts, one possibility is to load the data from flat file into data.table. Use setkey() to set your keys. Use tables() to summarise your various tables. Then do your joins etc all-in-R. data.table has fast ways to do those sorts of joins (but we need more info about your task). Alternatively, you could check out the sqldf website. There is an sqlread.csv (or similar name) which can read your files directly into SQL read.csv.sql instead of going via R. Gabor has some nice examples there about that and its faster. You use some buzzwords which makes me think that SQL may not be appropriate for your task though. Can't say for sure (because we don't have enough information) but its possible you are struggling because SQL has no row ordering concept built in. That might be why you've created an increment In the SQLite database it automatically assigns a self
Re: [R] tapply on multiple groups
Thanks. My mistake was that I used c(dbs.final$Days,dbs.final$Place) instead of list(... when I tried to follow that part of the documentation. David Winsemius dwinsem...@comcast.net 1/28/2010 11:49 AM On Jan 28, 2010, at 10:26 AM, GL wrote: Can you make tapply break down groups similar to bwplot or such? Example: Data frame has one measure (Days) and two Dimensions (MM and Place). All have the same length. length(dbs.final$Days) [1] 3306 length() [1] 3306 length() [1] 3306 Doing the following makes a nice table for one dimension and one measure: do.call(rbind,tapply(dbs.final$Days,dbs.final$Place, summary)) But, what I really need to do is break it down on two dimensions and one measures - effectively equivalent to the following bwplot call: bwplot( Days ~ MM | Place, ,data=dbs.final) Is there an equivalent to the | operation in tapply? Please reread the help page for tapply. Perhaps?: tapply(dbs.final$Days, list(dbs.final$MM, dbs.final$Place) summary) -- David -- View this message in context: http://n4.nabble.com/tapply-on-multiple-groups-tp1380593p1380593.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] weighted least squares vs linear regression
I need to find out the difference between the way R calculates weighted regression and standard regression. I want to plot a 95% confidence interval around an estimte i got from least squares regression. I cant find he documentation for this ive looked in ?stats ?lm ?predict.lm ?weights ?residuals.lm Can anyone shed light? thanks Chris. -- View this message in context: http://n4.nabble.com/weighted-least-squares-vs-linear-regression-tp1387957p1387957.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal
On Jan 28, 2010, at 10:04 AM, David Winsemius wrote: On Jan 28, 2010, at 10:55 AM, Marc Schwartz wrote: Ivan, The default behavior for print()ing objects to the console in an R session is via the use of the print.* methods. For real numerics, print.default() is used and the format is based upon the number of significant digits, not the number of decimal places. There is also an interaction with par(scipen), which influences when scientific notation is used. See ?print.default for more information on defaults and behavior, taking note of the 'digits' argument, which is influenced by options(digits). Importantly, you need to differentiate between how R stores numeric real values and how it displays or prints them. Internally, R stores real numbers using a double precision data type by default. The internal storage is not truncated by default and is stored to full precision for doubles, within binary representation limits. You can of course modify the values using functions such as round() or truncate(), etc. See ?round for more information. For display, Peter has already pointed you to sprintf() and related functions, which allow you to format output for pretty printing to things like column aligned tables and such. Those do not however, affect the default output to the R console. If one alters print.default, one can get different behavior, for instance: print.default - function (x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, right = FALSE, max = NULL, useSource = TRUE, ...) {if (is.numeric(x)) {x - as.numeric(sprintf(%7.3f, x))} noOpt - missing(digits) missing(quote) missing(na.print) missing(print.gap) missing(right) missing(max) missing(useSource) length(list(...)) == 0L .Internal(print.default(x, digits, quote, na.print, print.gap, right, max, useSource, noOpt)) } This will have the requested effect for numeric vectors, but does not seem to be altering the behavior of print.data.frame(). print(ac2) score pt times trt 1 28.825139 1 0 1 2 97.458521 1 3 1 3 26.217289 1 6 1 4 80.636507 2 0 1 5 99.729364 2 3 1 6 85.812312 2 6 1 7 2.515870 3 0 1 8 3.893545 3 3 1 9 55.666848 3 6 1 10 21.966027 4 0 1 print(ac2$score) [1] 28.825 97.459 26.217 80.637 99.729 85.812 2.516 3.894 55.667 21.966 David, The issue there is that when printing the vector, you are using print.default() directly, so you get the desired result with a numeric vector. When you print the data frame, internally print.data.frame() calls format.data.frame(), which then internally uses format() on a column-by-column basis and there is the rub. format() brings you back to using significant digits on numeric vectors and of course returns a character vector. By the time the output is actually print()ed to the console, the original data frame has been converted to a formatted character matrix and that is what gets printed. str(format.data.frame(ac2)) 'data.frame': 10 obs. of 4 variables: $ score:Class 'AsIs' chr [1:10] 28.825139 97.458521 26.217289 80.636507 ... $ pt :Class 'AsIs' chr [1:10] 1 1 1 2 ... $ times:Class 'AsIs' chr [1:10] 0 3 6 0 ... $ trt :Class 'AsIs' chr [1:10] 1 1 1 1 ... str(format.data.frame(ac2, digits = 2)) 'data.frame': 10 obs. of 4 variables: $ score:Class 'AsIs' chr [1:10] 28.8 97.5 26.2 80.6 ... $ pt :Class 'AsIs' chr [1:10] 1 1 1 2 ... $ times:Class 'AsIs' chr [1:10] 0 3 6 0 ... $ trt :Class 'AsIs' chr [1:10] 1 1 1 1 ... This is why changing print.default() by itself is not sufficient. Other object classes are formatted and printed in varying ways and print methods have been defined for them which may not use it directly. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exporting multidimensional matrix from R
On Jan 28, 2010, at 10:42 AM, Gopikrishna Deshpande wrote: Hi, I have a matrix of size 19x512x20 in R. No, you don't. Matrices are only 2 dimensional in R. You may have an array, however. I want to export this file into another format which can be imported into MATLAB. write.xls or write.table exports only one dimension. please send a code if possible. I am very new to R and have been struggling with this. install.packages(pkgs=R.matlab, dependencies=TRUE) library(R.matlab) ?writeMat filename - ~/test.mat writeMat(filename, arr=arr) readMat(filename) $arr , , 1 [,1] [,2] [,3] [1,]147 [2,]258 [3,]369 , , 2 [,1] [,2] [,3] [1,] 10 13 16 [2,] 11 14 17 [3,] 12 15 18 , , 3 [,1] [,2] [,3] [1,] 19 22 25 [2,] 20 23 26 [3,] 21 24 27 attr(,header) attr(,header)$description [1] MATLAB 5.0 MAT-file, Platform: unix, Software: R v2.10.1, Created on: Thu Jan 28 12:08:25 2010 attr(,header)$version [1] 5 attr(,header)$endian [1] little Thanks ! Gopi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of decimal
On Jan 28, 2010, at 12:08 PM, Marc Schwartz wrote: On Jan 28, 2010, at 10:04 AM, David Winsemius wrote: On Jan 28, 2010, at 10:55 AM, Marc Schwartz wrote: Ivan, The default behavior for print()ing objects to the console in an R session is via the use of the print.* methods. For real numerics, print.default() is used and the format is based upon the number of significant digits, not the number of decimal places. There is also an interaction with par(scipen), which influences when scientific notation is used. See ?print.default for more information on defaults and behavior, taking note of the 'digits' argument, which is influenced by options(digits). Importantly, you need to differentiate between how R stores numeric real values and how it displays or prints them. Internally, R stores real numbers using a double precision data type by default. The internal storage is not truncated by default and is stored to full precision for doubles, within binary representation limits. You can of course modify the values using functions such as round() or truncate(), etc. See ?round for more information. For display, Peter has already pointed you to sprintf() and related functions, which allow you to format output for pretty printing to things like column aligned tables and such. Those do not however, affect the default output to the R console. If one alters print.default, one can get different behavior, for instance: print.default - function (x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, right = FALSE, max = NULL, useSource = TRUE, ...) {if (is.numeric(x)) {x - as.numeric(sprintf(%7.3f, x))} noOpt - missing(digits) missing(quote) missing(na.print) missing(print.gap) missing(right) missing(max) missing(useSource) length(list(...)) == 0L .Internal(print.default(x, digits, quote, na.print, print.gap, right, max, useSource, noOpt)) } This will have the requested effect for numeric vectors, but does not seem to be altering the behavior of print.data.frame(). print(ac2) score pt times trt 1 28.825139 1 0 1 2 97.458521 1 3 1 3 26.217289 1 6 1 4 80.636507 2 0 1 5 99.729364 2 3 1 6 85.812312 2 6 1 7 2.515870 3 0 1 8 3.893545 3 3 1 9 55.666848 3 6 1 10 21.966027 4 0 1 print(ac2$score) [1] 28.825 97.459 26.217 80.637 99.729 85.812 2.516 3.894 55.667 21.966 David, The issue there is that when printing the vector, you are using print.default() directly, so you get the desired result with a numeric vector. Thanks, Marc; I do understand. I had been hoping that there might be a final common pathway to use a biochemistry analogy, at least for numeric objects, but it appears not. -- David. When you print the data frame, internally print.data.frame() calls format.data.frame(), which then internally uses format() on a column- by-column basis and there is the rub. format() brings you back to using significant digits on numeric vectors and of course returns a character vector. By the time the output is actually print()ed to the console, the original data frame has been converted to a formatted character matrix and that is what gets printed. str(format.data.frame(ac2)) 'data.frame': 10 obs. of 4 variables: $ score:Class 'AsIs' chr [1:10] 28.825139 97.458521 26.217289 80.636507 ... $ pt :Class 'AsIs' chr [1:10] 1 1 1 2 ... $ times:Class 'AsIs' chr [1:10] 0 3 6 0 ... $ trt :Class 'AsIs' chr [1:10] 1 1 1 1 ... str(format.data.frame(ac2, digits = 2)) 'data.frame': 10 obs. of 4 variables: $ score:Class 'AsIs' chr [1:10] 28.8 97.5 26.2 80.6 ... $ pt :Class 'AsIs' chr [1:10] 1 1 1 2 ... $ times:Class 'AsIs' chr [1:10] 0 3 6 0 ... $ trt :Class 'AsIs' chr [1:10] 1 1 1 1 ... This is why changing print.default() by itself is not sufficient. Other object classes are formatted and printed in varying ways and print methods have been defined for them which may not use it directly. HTH, Marc Schwartz David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted least squares vs linear regression
You'll probably need to consult a suitable text on linear models/applied regression, as this is a statistics, not an R question -- or look for a suitable tutorial on the web. You might also try one of the statistics mailing lists or Google on some suitable phrase. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of DispersionMap Sent: Thursday, January 28, 2010 9:06 AM To: r-help@r-project.org Subject: [R] weighted least squares vs linear regression I need to find out the difference between the way R calculates weighted regression and standard regression. I want to plot a 95% confidence interval around an estimte i got from least squares regression. I cant find he documentation for this ive looked in ?stats ?lm ?predict.lm ?weights ?residuals.lm Can anyone shed light? thanks Chris. -- View this message in context: http://n4.nabble.com/weighted-least-squares-vs-linear-regression-tp1387957p1 387957.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data.frame manipulation
Hi: On Thu, Jan 28, 2010 at 8:40 AM, AC Del Re de...@wisc.edu wrote: Thank you, Dennis and Petr. One more question: when aggregating to one es per id, how would I go about keeping the other variables in the data.frame (e.g., keeping the value for the first row of the other variables, such as mod2) e.g.: # Dennis provided this example (notice how mod2 is removed from the output): with(x, aggregate(list(es = es), by = list(id = id, mod1 = mod1), mean)) id mod1 es 1 31 0.20 2 12 0.30 3 24 0.15 # How can I get this output (taking the first row of the other variable in the data.frame): id es mod1 mod2 1 .30 2wai 2 .15 4other 3 .20 1 itas Using ddply from the plyr package: ddply(x, .(id, mod1), summarize, es = mean(es), mod2 = head(mod2, 1)) id mod1 es mod2 1 12 0.30 wai 2 24 0.15 other 3 31 0.20 itas mod2 = head(...) selects the first instance of mod2 in each id/mod1 combination. It appears from the help page that aggregate only allows one summary function per call; if so, it wouldn't be able to do this. You could, however, do this in the doBy package with a custom summary function. HTH, Dennis Thank you, AC On Thu, Jan 28, 2010 at 1:29 AM, Petr PIKAL petr.pi...@precheza.czwrote: HI r-help-boun...@r-project.org napsal dne 28.01.2010 04:35:29: Hi All, I'm conducting a meta-analysis and have taken a data.frame with multiple rows per study (for each effect size) and performed a weighted average of effect size for each study. This results in a reduced # of rows. I am particularly interested in simply reducing the additional variables in the data.frame to the first row of the corresponding id variable. For example: id-c(1,2,2,3,3,3) es-c(.3,.1,.3,.1,.2,.3) mod1-c(2,4,4,1,1,1) mod2-c(wai,other,calpas,wai,itas,other) data-as.data.frame(cbind(id,es,mod1,mod2)) Do not use cbind. Its output is a matrix and in this case character matrix. Resulting data frame will consist from factors as you can check by str(data) data-data.frame(id=id,es=es,mod1=mod1,mod2=mod2) data id esmod1 mod2 1 1 0.32 wai 2 2 0.14 other 3 2 0.24 calpas 4 3 0.11 itas 5 3 0.21 wai 6 3 0.31 wai # I would like to reduce the entire data.frame like this: E.g. aggregate aggregate(data[, -(3:4)], data[,3:4], mean) mod1 mod2 id es 14 calpas 2 0.3 21 itas 3 0.2 31 other 3 0.3 44 other 2 0.1 51wai 3 0.1 62wai 1 0.3 doBy or tapply or ddply from plyr library or Regards Petr id es mod1 mod2 1 .30 2wai 2 .15 4other 3 .20 1 itas # If possible, I would also like the option of this (collapsing on id and mod2): id es mod1 mod2 1 .30 2wai 2 0.1 4 other 2 0.2 4calpas 3 0.1 1 itas 3 0.251 wai Any help is much appreciated! AC Del Re [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with fitdistr
vikrant wrote: Hi, I want to estimate parameters of weibull distribution. For this, I am using fitdistr() function in MASS package.But when I give fitdistr(c,weibull) I get a Error as follows:- Error in optim(x = c(4L, 41L, 20L, 6L, 12L, 6L, 7L, 13L, 2L, 8L, 22L, : non-finite value supplied by optim Any help or suggestions are most welcomed Use function pelwei() in package lmom. J. R. M. Hosking __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] qplot themes
Hi, I'm having trouble editing the qplot layout. I'm using the geom=tile option and I want to do a few things: 1. move the vertical and horizontal gridlines so that they appear on the edge of each tile (right now they're in the middle) 2. bring the gridlines to the foreground and change their color I've been playing around with the opts(...) options but so far can't get any of them to work correctly. Has anyone done this or have an example thanks -- View this message in context: http://n4.nabble.com/qplot-themes-tp1388708p1388708.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about reshape
Hello everyone, I have a bit of a problem with reshape function in R. I have simulated some normal data, which I have saved in 4 vectors. y.1,y.2,y.3,y.4 which I combined a dataset: datasetcbind(y1,y2,y3,y4). I have also generated some subject id number, and denoted that by subject. So, my dataset looks like this: subject y.1y.2 y.3 y.4 [1,] 1 20.302707 16.9643106 30.291031 7.118748 [2,] 2 9.942679 9.3674844 7.578465 16.494813 ..etc, I have 20 subjects. I want to transform this data into long form dataset, but it does not work. I am using reshape command, and should be very straight forward... Here is what I use: long-reshape(dataset, idvar=subject, v.names=response, varying=list(2:5), direction=long) Here is what I get: Error in d[, timevar] - times[1L] : subscript out of bounds Now, do I get that error because the first column shows me the row number? I have been using R for a while, but not a lot for data manipulations. Any help would be great! Thank you in advance. Dana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpolation
Why not look into the zoo package na.approx? And related functions. On Thu, Jan 28, 2010 at 11:29 AM, ogbos okike ogbos.ok...@gmail.com wrote: Happy New Year. I have a data of four columns - year, month, day and count. The last column, count, contains some missing data which I have to replace with NA. I tried to use the method of interpolation to assign some values to these NA so that the resulting plot will be better. I used x to represent date and y to represent count. With the method below, I tried to interpolate on the NA's, but that resulted in the warning message below. I went ahead and plotted the graph of date against count (plot attached). The diagonal line between May and Jul is not looking good and I suspect that it is the result of the warning message. It would be appreciated if anybody could give me some help. Warmest regards Ogbos y1-approx(x,y,xout = x)$y Warning message: In approx(x, y, xout = x) : collapsing to unique 'x' values __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about reshape
Try this: ong-reshape(as.data.frame(dataset), idvar=subject, v.names=response, varying=list(2:5), direction=long) or dataset - cbind.data.frame(y1, y2, y3, y4) On Thu, Jan 28, 2010 at 3:07 PM, Dana TUDORASCU dana...@gmail.com wrote: Hello everyone, I have a bit of a problem with reshape function in R. I have simulated some normal data, which I have saved in 4 vectors. y.1,y.2,y.3,y.4 which I combined a dataset: datasetcbind(y1,y2,y3,y4). I have also generated some subject id number, and denoted that by subject. So, my dataset looks like this: subject y.1 y.2 y.3 y.4 [1,] 1 20.302707 16.9643106 30.291031 7.118748 [2,] 2 9.942679 9.3674844 7.578465 16.494813 ..etc, I have 20 subjects. I want to transform this data into long form dataset, but it does not work. I am using reshape command, and should be very straight forward... Here is what I use: long-reshape(dataset, idvar=subject, v.names=response, varying=list(2:5), direction=long) Here is what I get: Error in d[, timevar] - times[1L] : subscript out of bounds Now, do I get that error because the first column shows me the row number? I have been using R for a while, but not a lot for data manipulations. Any help would be great! Thank you in advance. Dana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpolation
The warning message simply indicates that you have more than one data point with the same x value. So, `approx' collapses over the dulicate x values by averaging the corresponding y values. I am not sure if this is your problem - it doesn't seem like it. It is doing what seems reasonable for a linear interpolation. If you have some idea of how the interpolation should look like, you may fit a model to the data and impute based on the model. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of ogbos okike Sent: Thursday, January 28, 2010 12:30 PM To: r-help@r-project.org Subject: [R] Interpolation Happy New Year. I have a data of four columns - year, month, day and count. The last column, count, contains some missing data which I have to replace with NA. I tried to use the method of interpolation to assign some values to these NA so that the resulting plot will be better. I used x to represent date and y to represent count. With the method below, I tried to interpolate on the NA's, but that resulted in the warning message below. I went ahead and plotted the graph of date against count (plot attached). The diagonal line between May and Jul is not looking good and I suspect that it is the result of the warning message. It would be appreciated if anybody could give me some help. Warmest regards Ogbos y1-approx(x,y,xout = x)$y Warning message: In approx(x, y, xout = x) : collapsing to unique 'x' values __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RMySQL - Bulk loading data and creating FK links
I'm talking about ease of use to. The first line of the Details section in ?[.data.table says : Builds on base R functionality to reduce 2 types of time : 1. programming time (easier to write, read, debug and maintain) 2. compute time Once again, I am merely saying that the user has choices, and the best choice (and there are many choices including plyr, and lots of other great packages and base methods) depends on the task and the real goal. This choice is not restricted to compute time only, as you seem to suggest. In fact I listed programming time first (i.e ease of use). To answer your points : This is the SQL code you posted and I used in the comparison. Notice its quite long, repeats the text var1,var2,var3 4 times, contains two 'select's and a 'using'. system.time(sqldf(select var1, var2, var3, dt from a, (select var1, var2, var3, min(dt) mindt from a group by var1, var2, var3) using(var1, var2, var3) where dt - mindt 7)) user system elapsed 103.132.17 106.23 Isolating the series of operations you described : system.time(sqldf(select * from a)) user system elapsed 39.000.63 39.62 So thats roughly 40% of the time. Whats happening in the remaining 66 secs? Heres a repeat of the equivalent in data.table : system.time({adt-data.table(a)}) user system elapsed 0.900.131.03 system.time(adt[ , list(dt=dt[dt-min(dt)7]) , by=var1,var2,var3]) # is that so hard to use compared to the SQL above ? user system elapsed 3.920.784.71 I looked at the news section, but I didn't find the benchmarks quickly or easily. The links I saw took me to the FAQs. Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001280855i1d5f7c03v46f7a3e58ff93...@mail.gmail.com... I think one would only be concerned about such internals if one were primarily interested in performance; otherwise, one would be more interested in ease of specification and part of that ease is having it independent of implementation and separating implementation from specification activities. An example of separation of specification and implementation is that by simply specifying a disk-based database rather than an in-memory database SQL can perform queries that take more space than memory. The query itself need not be modified. I think the viewpoint you are discussing is primarily one of performance whereas the viewpoint I was discussing is primarily ease of use and that accounts for the difference. I believe your performance comparison is comparing a sequence of operations that include building a database, transferring data to it, performing the operation, reading it back in and destroying the database to an internal manipulation. I would expect the internal manipulation, particular one done primarily in C code as is the case with data.table, to be faster although some benchmarks of the database approach found that it compared surprisingly well to straight R code -- some users of sqldf found that for an 8000 row data frame sqldf actually ran faster than aggregate and also faster than tapply. The News section on the sqldf home page provides links to their benchmarks. Thus if R is fast enough then its likely that the database approach is fast enough too since its even faster. On Thu, Jan 28, 2010 at 8:52 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: Are you claiming that SQL is that utopia? SQL is a row store. It cannot give the user the benefits of column store. For example, why does SQL take 113 seconds in the example in this thread : http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html but data.table takes 5 seconds to get the same result ? How come the high level language SQL doesn't appear to hide the user from this detail ? If you are just describing utopia, then of course I agree. It would be great to have a language which hid us from this. In the meantime the user has choices, and the best choice depends on the task and the real goal. Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001280428p345f8ff4v5f3a80c13f96d...@mail.gmail.com... Its only important internally. Externally its undesirable that the user have to get involved in it. The idea of making software easy to write and use is to hide the implementation and focus on the problem. That is why we use high level languages, object orientation, etc. On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How it represents data internally is very important, depending on the real goal : http://en.wikipedia.org/wiki/Column-oriented_DBMS Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com... How it represents data internally should not be important as long as you can do what you want. SQL is declarative so you just specify what you want rather than how to get it and invisibly to the user it automatically draws
Re: [R] weighted least squares vs linear regression
sorry, i ommited some important information. this is a documentation question! i meant to ask how to find out how R calculates the standard error and how it differs between the two models -- View this message in context: http://n4.nabble.com/weighted-least-squares-vs-linear-regression-tp1387957p1393060.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grid.image(), pckg grid
Hi Markus Loecher wrote: While I am very happy with and awed by the grid package and its basic plotting primitives such as grid.points, grid.lines, etc, I was wondering whether the equivalent of a grid.image() function exists ? No. But a simple implementation based on grid.rect() is not too hard (e.g., see http://www.stat.auckland.ac.nz/~paul/RGraphics/interactgrid-imagefun.R) Also, the next version of R will include a grid.raster() function, which will provide another way to draw a matrix of colour values. Paul Any pointer would be helpful. Thanks ! Markus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 p...@stat.auckland.ac.nz http://www.stat.auckland.ac.nz/~paul/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RMySQL - Bulk loading data and creating FK links
Regarding the explanation of where the time goes it might be parsing the statement or the development of the query plan. The SQL statement for the more complex query is obviously much longer and its generated query plan involves 95 lines of byte code vs 19 lines of generated code for the simpler query. On Thu, Jan 28, 2010 at 2:02 PM, Matthew Dowle mdo...@mdowle.plus.com wrote: I'm talking about ease of use to. The first line of the Details section in ?[.data.table says : Builds on base R functionality to reduce 2 types of time : 1. programming time (easier to write, read, debug and maintain) 2. compute time Once again, I am merely saying that the user has choices, and the best choice (and there are many choices including plyr, and lots of other great packages and base methods) depends on the task and the real goal. This choice is not restricted to compute time only, as you seem to suggest. In fact I listed programming time first (i.e ease of use). To answer your points : This is the SQL code you posted and I used in the comparison. Notice its quite long, repeats the text var1,var2,var3 4 times, contains two 'select's and a 'using'. system.time(sqldf(select var1, var2, var3, dt from a, (select var1, var2, var3, min(dt) mindt from a group by var1, var2, var3) using(var1, var2, var3) where dt - mindt 7)) user system elapsed 103.13 2.17 106.23 Isolating the series of operations you described : system.time(sqldf(select * from a)) user system elapsed 39.00 0.63 39.62 So thats roughly 40% of the time. Whats happening in the remaining 66 secs? Heres a repeat of the equivalent in data.table : system.time({adt-data.table(a)}) user system elapsed 0.90 0.13 1.03 system.time(adt[ , list(dt=dt[dt-min(dt)7]) , by=var1,var2,var3]) # is that so hard to use compared to the SQL above ? user system elapsed 3.92 0.78 4.71 I looked at the news section, but I didn't find the benchmarks quickly or easily. The links I saw took me to the FAQs. Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001280855i1d5f7c03v46f7a3e58ff93...@mail.gmail.com... I think one would only be concerned about such internals if one were primarily interested in performance; otherwise, one would be more interested in ease of specification and part of that ease is having it independent of implementation and separating implementation from specification activities. An example of separation of specification and implementation is that by simply specifying a disk-based database rather than an in-memory database SQL can perform queries that take more space than memory. The query itself need not be modified. I think the viewpoint you are discussing is primarily one of performance whereas the viewpoint I was discussing is primarily ease of use and that accounts for the difference. I believe your performance comparison is comparing a sequence of operations that include building a database, transferring data to it, performing the operation, reading it back in and destroying the database to an internal manipulation. I would expect the internal manipulation, particular one done primarily in C code as is the case with data.table, to be faster although some benchmarks of the database approach found that it compared surprisingly well to straight R code -- some users of sqldf found that for an 8000 row data frame sqldf actually ran faster than aggregate and also faster than tapply. The News section on the sqldf home page provides links to their benchmarks. Thus if R is fast enough then its likely that the database approach is fast enough too since its even faster. On Thu, Jan 28, 2010 at 8:52 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: Are you claiming that SQL is that utopia? SQL is a row store. It cannot give the user the benefits of column store. For example, why does SQL take 113 seconds in the example in this thread : http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html but data.table takes 5 seconds to get the same result ? How come the high level language SQL doesn't appear to hide the user from this detail ? If you are just describing utopia, then of course I agree. It would be great to have a language which hid us from this. In the meantime the user has choices, and the best choice depends on the task and the real goal. Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001280428p345f8ff4v5f3a80c13f96d...@mail.gmail.com... Its only important internally. Externally its undesirable that the user have to get involved in it. The idea of making software easy to write and use is to hide the implementation and focus on the problem. That is why we use high level languages, object orientation, etc. On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How it represents data internally is very important, depending on the real goal
Re: [R] Constrained vector permutation
Hi Jason, Thanks for you suggestions, I think that's pretty close to what I'd need. The only glitch is that I'd be working with a vector of ~30 elements, so permutations(...) would take quite a long time. I only need one permutation per vector (the whole routine will be within a loop that generates pseudo-random vectors that could potentially conform to the constraints). In light of that, do you think I'd be better off doing something like: v.permutations - replicate(1,sample(v,length(v),rep=FALSE)) # instead of permutations() results - apply(v.permutations,2,function(x){all(x = f(x[1],length(x)-1))}) # function f(...) would be like your f It wouldn't be guaranteed to produce any usable permutation, but it seems like it would be much faster and so could be repeated until an acceptable vector is found. What do you think? Thanks-- Andy On Thu, Jan 28, 2010 at 6:15 AM, Jason Smith devja...@gmail.com wrote: I just realized I read through your email too quickly and my script does not actually address the constraint on each permutation, sorry about that. You should be able to use the permutations function to generate the vector permutations however. Jason __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using functions with multiple arguments in the apply family
chipmaney wrote: typically, the apply family wants you to use vectors to run functions on. However, I have a function, kruskal.test, that requires 2 arguments. kruskal.test(Herb.df$Score,Herb.df$Year) This easily computes the KW ANOVA statistic for any difference across years However, my data has multiple sites on which KW needs to be run... here's the data: Herb.df- data.frame(Score=rep(c(2,4,6,6,6,5,7,8,6,9),2),Year=rep(c(rep(1,5),rep(2,5)),2),Site=c(rep(3,10),rep(4,10))) However, if I try this: tapply(Herb.df,Herb.df$Site,function(.data) kruskal.test(.data$Indicator_Rating,.data$Year)) Error in tapply(Herb.df, Herb.df$ID, function(.data) kruskal.test(.data$Indicator_Rating, : arguments must have same length How can I vectorize the kruskal.test() for all sites using tapply() in lieu of a loop? Your example data makes little sense; you have precisely the same data for both sites and you have only two sites (why do kruskal.test on two sites?). Finally, you need to decide what your response variable is: 'Score' or 'Indicator_Rating'. So here's some made-up data and the use of by() to apply the test to each site: dat - data.frame(y = rnorm(60), yr=gl(4,5,60), st=gl(3,20)) with(dat, by(dat, st, function(x) kruskal.test(y~yr, data=x))) See the last example in ?by. -Peter Ehlers -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting significant predictors from ANOVA result
Hi Ram, As others have pointed out, writing the code is the least of your problems. In case this isn't sinking in, try the following exercise: set.seed(10) P - vector() DF - as.data.frame(matrix(rep(NA, 10), nrow=100)) names(DF) - c(paste(x,1:999, sep=), y) for(i in 1:1000) { DF[,i] - rnorm(100) } for(i in 1:999) { P[i] - summary(lm(DF$y ~ DF[,i]))$coefficients[2,4] } which(P .05) Notice that the variables in the data set DF are random numbers. The fact that 53 of them are 'significantly' correlated with y at p .05 doesn't change that. So in this example, those 53 significant predictors are meaningless. And your actual problem is even worse than this example, because you're running way more than 999 models. As has already been suggested, it's time to consult a statistician. -Ista On Thu, Jan 28, 2010 at 3:39 AM, ram basnet basnet...@yahoo.com wrote: Dear Sir, Thanks for your message. My problem is in writing codes. I did ANOVA for 75000 response variables (let's say Y) with 243 predictors (let's say X-matrix) one by one with for loop in R. I stored the p-values of all predictors, however, i have very huge file because i have pvalues of 243 predictors for all 75000 Y-variables. Now, i want to find some codes that autamatically select only significant X-predictors from the whole list. If you have ideas on that, it will be great help. Thanks in advances Sincerely, Ram --- On Wed, 1/27/10, Bert Gunter gunter.ber...@gene.com wrote: From: Bert Gunter gunter.ber...@gene.com Subject: RE: [R] selecting significant predictors from ANOVA result To: 'ram basnet' basnet...@yahoo.com, 'R help' r-help@r-project.org Date: Wednesday, January 27, 2010, 7:56 AM Ram: You do not say how many cases (rows in your dataset) you have, but I suspect it may be small (a few hundred, say). In any case, what you describe is probably just a complicated way to generate random numbers -- it is **highly** unlikely that any meaningful, replicable scientific results would result from your proposed approach. Not surprising -- this appears to be a very difficult data analysis issue. It is obvious that you have only a minimal statistical background, so I would strongly recommend that you find a competent local statistician to help you with your work. Remote help from this list is wholly inadequate. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of ram basnet Sent: Wednesday, January 27, 2010 2:52 AM To: R help Subject: [R] selecting significant predictors from ANOVA result Dear all, I did ANOVA for many response variables (Var1, Var2, Var75000), and i got the result of p-value like below. Now, I want to select those predictors, which have pvalue less than or equal to 0.05 for each response variable. For example, X1, X2, X3, X4, X5 and X6 in case of Var1, and similarly, X1, X2...X5 in case of Var2, only X1 in case of Var3 and none of the predictors in case of Var4. predictors Var1 Var2 Var3 Var4 X1 0.5 0.001 0.05 0.36 X2 0.0001 0.001 0.09 0.37 X3 0.0002 0.005 0.13 0.38 X4 0.0003 0.01 0.17 0.39 X5 0.01 0.05 0.21 0.4 X6 0.05 0.0455 0.25 0.41 X7 0.038063 0.0562 0.29 0.42 X8 0.04605 0.0669 0.33 0.43 X9 0.054038 0.0776 0.37 0.44 X10 0.062025 0.0883 0.41 0.45 I have very large data sets (# of response variables = ~75,000). So, i need some kind of automated procedure. But i have no ideas. If i got help from some body, it will be great for me. Thanks in advance. Sincerely, Ram Kumar Basnet, Ph. D student Wageningen University, The Netherlands. [[alternative HTML version deleted]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] navigation panel with help
All, I installed the lastest version of R 2.10.1. On the help page for a specific function, it turns out that the vertical navigation panel on the left does not appear anymore. For example, ?lm The help page from this command is a page without navigation panel (which I prefer to use). I notice there is an index link at the bottom of the page. By the way, I did not make any change on my browser. Is this a change for this version? Thank you for your help. Edwin Sun -- View this message in context: http://n4.nabble.com/navigation-panel-with-help-tp1395663p1395663.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about reshape
Thank you very much everybody. That worked. Dana On Thu, Jan 28, 2010 at 12:23 PM, Henrique Dallazuanna www...@gmail.comwrote: Try this: ong-reshape(as.data.frame(dataset), idvar=subject, v.names=response, varying=list(2:5), direction=long) or dataset - cbind.data.frame(y1, y2, y3, y4) On Thu, Jan 28, 2010 at 3:07 PM, Dana TUDORASCU dana...@gmail.com wrote: Hello everyone, I have a bit of a problem with reshape function in R. I have simulated some normal data, which I have saved in 4 vectors. y.1,y.2,y.3,y.4 which I combined a dataset: datasetcbind(y1,y2,y3,y4). I have also generated some subject id number, and denoted that by subject. So, my dataset looks like this: subject y.1y.2 y.3 y.4 [1,] 1 20.302707 16.9643106 30.291031 7.118748 [2,] 2 9.942679 9.3674844 7.578465 16.494813 ..etc, I have 20 subjects. I want to transform this data into long form dataset, but it does not work. I am using reshape command, and should be very straight forward... Here is what I use: long-reshape(dataset, idvar=subject, v.names=response, varying=list(2:5), direction=long) Here is what I get: Error in d[, timevar] - times[1L] : subscript out of bounds Now, do I get that error because the first column shows me the row number? I have been using R for a while, but not a lot for data manipulations. Any help would be great! Thank you in advance. Dana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] navigation panel with help
On 28/01/2010 3:15 PM, Edwin Sun wrote: All, I installed the lastest version of R 2.10.1. On the help page for a specific function, it turns out that the vertical navigation panel on the left does not appear anymore. For example, ?lm The help page from this command is a page without navigation panel (which I prefer to use). I notice there is an index link at the bottom of the page. By the way, I did not make any change on my browser. Is this a change for this version? Thank you for your help. Yes, we have dropped support for CHM help, which had the navigation pane. The default display is now HTML help. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] navigation panel with help
Duncan, Thank you for your quick reply. Do we users have any options to change that? I personally become addicted to the navigation panel and feel it is kind of table of contents. Regards, Edwin Sun -Original Message- From: Duncan Murdoch [mailto:murd...@stats.uwo.ca] Sent: Thursday, January 28, 2010 2:19 PM To: Changyou Sun Cc: r-help@r-project.org Subject: Re: [R] navigation panel with help On 28/01/2010 3:15 PM, Edwin Sun wrote: All, I installed the lastest version of R 2.10.1. On the help page for a specific function, it turns out that the vertical navigation panel on the left does not appear anymore. For example, ?lm The help page from this command is a page without navigation panel (which I prefer to use). I notice there is an index link at the bottom of the page. By the way, I did not make any change on my browser. Is this a change for this version? Thank you for your help. Yes, we have dropped support for CHM help, which had the navigation pane. The default display is now HTML help. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] navigation panel with help
On 28/01/2010 3:22 PM, Changyou Sun wrote: Duncan, Thank you for your quick reply. Do we users have any options to change that? I personally become addicted to the navigation panel and feel it is kind of table of contents. You could downgrade to 2.9.2, but you'd lose all the other new stuff. You could contribute code to display the table of contents in the HTML version. I suppose you could resurrect the old code and build your own CHM help in 2.10.1, but I would guess adding the table of contents to the HTML help would be easier. Duncan Murdoch Regards, Edwin Sun -Original Message- From: Duncan Murdoch [mailto:murd...@stats.uwo.ca] Sent: Thursday, January 28, 2010 2:19 PM To: Changyou Sun Cc: r-help@r-project.org Subject: Re: [R] navigation panel with help On 28/01/2010 3:15 PM, Edwin Sun wrote: All, I installed the lastest version of R 2.10.1. On the help page for a specific function, it turns out that the vertical navigation panel on the left does not appear anymore. For example, ?lm The help page from this command is a page without navigation panel (which I prefer to use). I notice there is an index link at the bottom of the page. By the way, I did not make any change on my browser. Is this a change for this version? Thank you for your help. Yes, we have dropped support for CHM help, which had the navigation pane. The default display is now HTML help. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained vector permutation
It wouldn't be guaranteed to produce any usable permutation, but it seems like it would be much faster and so could be repeated until an acceptable vector is found. What do you think? Thanks-- Andy I think I am not understanding what your ultimate goal is so I'm not sure I can give you appropriate advice. Are you looking for a single valid permutation or all of them? Since that constraint sets a ceiling on each subsequent value, it seems like you could solve this problem more easily and quickly by using a search strategy instead of random sampling or generating all permutations then testing. The constraint will help prune the search space so you only generate valid permutations. Once you are examining a particular element you can determine which of the additional elements would be valid, so only consider those. --jason __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] color palette for points, lines, text / interactive Rcolorpicker?
I don't know of any existing palettes that meet your conditions, but here are a couple of options for interactive exploration of colorsets (this is quick and dirty, there are probably some better orderings, base colors, etc.): colpicker - function( cols=colors() ) { n - length(cols) nr - ceiling(sqrt(n)) nc - ceiling( n/nr ) imat - matrix(c(seq_along(cols), rep(NA, nr*nc-n) ), ncol=nc, nrow=nr) image( seq.int(nr),seq.int(nc), imat, col=cols, xlab='', ylab='' ) xy - locator() cols[ imat[ cbind( round(xy$x), round(xy$y) ) ] ] } colpicker() ## another approach library(TeachingDemos) cols - colors() n - length(cols) par(xpd=TRUE) # next line only works on windows HWidentify( (1:n) %% 26, (1:n) %/% 26, label=cols, col=cols, pch=15, cex=2 ) # next line works on all platforms with tcltk HTKidentify( (1:n) %% 26, (1:n) %/% 26, label=cols, col=cols, pch=15, cex=2 ) # reorder cols.rgb - col2rgb( cols ) d - dist(t(cols.rgb)) clst - hclust(d) colpicker(cols[clst$order]) HWidentify( (1:n) %% 26, (1:n) %/% 26, label=cols[clst$order], col=cols[clst$order], pch=15, cex=2 ) ## or HTKidentify cols.hsv - rgb2hsv( cols.rgb ) d2 - dist(t(cols.hsv)) clst2 - hclust(d2) HWidentify( (1:n) %% 26, (1:n) %/% 26, label=cols[clst2$order], col=cols[clst2$order], pch=15, cex=2 ) ## or HTKidentify Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Michael Friendly Sent: Thursday, January 28, 2010 8:38 AM To: R-Help Subject: [R] color palette for points, lines, text / interactive Rcolorpicker? I'm looking for a scheme to generate a default color palette for plotting points, lines and text (on a white or transparent background) with from 2 to say 9 colors with the following constraints: - red is reserved for another purpose - colors should be highly distinct - avoid light colors (like yellows) In RColorBrewer, most of the schemes are designed for area fill rather than points and lines. The closest I can find for these needs is the Dark2 palette, e.g., library(RColorBrewer) display.brewer.pal(7,Dark2) I'm wondering if there is something else I can use. On a related note, I wonder if there is something like an interactive color picker for R. For example, http://research.stowers-institute.org/efg/R/Color/Chart/ displays several charts of all R colors. I'd like to find something that displays such a chart and uses identify() to select a set of tiles, whose colors() indices are returned by the function. -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data frame of different sized lists in a function call
I'm hoping to get some best practice feedback for constructing a function call which takes an undefined set of DIFFERENT length vectors -- e.g. say we have two lists: list1=c(1:10) list2=c(2:4) lists = data.frame(list1,list2) coerces those two to be the same length (recycling list2 to fill in the missing rows) -- what is a quick way of having each of those lists retain their original lengths? my function ultimately should look like: myfunction = function(lists) { ... } I'm hoping this can be done with a single line, so the user doesn't have to pre-construct the data.frame before running the function, if at all possible. Thanks! --j -- Jonathan A. Greenberg, PhD Postdoctoral Scholar Center for Spatial Technologies and Remote Sensing (CSTARS) University of California, Davis One Shields Avenue The Barn, Room 250N Davis, CA 95616 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighted least squares vs linear regression
On Jan 28, 2010, at 2:14 PM, DispersionMap wrote: sorry, i ommited some important information. this is a documentation question! i meant to ask how to find out how R calculates the standard error and how it differs between the two models Luke, Use the Code!. -- View this message in context: http://n4.nabble.com/weighted-least-squares-vs-linear-regression-tp1387957p1393060.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame of different sized lists in a function call
On Jan 28, 2010, at 4:03 PM, Jonathan Greenberg wrote: list1=c(1:10) # neither of which really are lists list2=c(2:4) lists = list(list1,list2) $ a list of two vectors. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame of different sized lists in a function call
If you understand the differences between R lists and R vectors then this should be easy: vec1 - 1:10 vec2 - 2:4 myListOfVectors - list( vec1, vec2 ) Now you can pass the single list of 2 different sized vectors to your function. For more details on working with lists (and vectors and functions and ... ) read An Introduction to R which is worth a lot more than you pay for it. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Jonathan Greenberg Sent: Thursday, January 28, 2010 2:03 PM To: r-help Subject: [R] Data frame of different sized lists in a function call I'm hoping to get some best practice feedback for constructing a function call which takes an undefined set of DIFFERENT length vectors -- e.g. say we have two lists: list1=c(1:10) list2=c(2:4) lists = data.frame(list1,list2) coerces those two to be the same length (recycling list2 to fill in the missing rows) -- what is a quick way of having each of those lists retain their original lengths? my function ultimately should look like: myfunction = function(lists) { ... } I'm hoping this can be done with a single line, so the user doesn't have to pre-construct the data.frame before running the function, if at all possible. Thanks! --j -- Jonathan A. Greenberg, PhD Postdoctoral Scholar Center for Spatial Technologies and Remote Sensing (CSTARS) University of California, Davis One Shields Avenue The Barn, Room 250N Davis, CA 95616 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error on using lag function
Hello everyone, I have a vector P and I want to replace each of its missing values by its next element, for example: P[i] = NA -- P[i] = P[i+1] To do this I am using the replace() and lag() functions like this: P - replace(as.ts(P),is.na(as.ts(P)),as.ts(lag(P,1))) but here is the error that I get: Warning message: In NextMethod([-) : number of items to replace is not a multiple of replacement length I have tried to reduce the dimension of P on the first two elements of the replace() function by one but it wouldn't work either. Any idea? - Anna Lippel -- View this message in context: http://n4.nabble.com/Error-on-using-lag-function-tp1399935p1399935.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hist - unevenly spaced bars
I am sure this is trivial, but I cannot solve it. I make a histogram. There are 5 categories 1,...,5 and 80 values and the histogram does not evenly space the bars. Bars 1 and 2 have no space between them and the rest are evenly spaced. How can I get all bars evenly spaced? The code: Q5 [1] 4 4 4 5 2 4 5 3 4 5 3 4 3 5 2 4 5 5 4 [20] 3 1 4 5 5 4 3 1 5 4 3 5 3 3 5 5 5 5 4 [39] 4 5 1 1 5 4 4 4 1 4 4 5 5 2 4 5 4 3 4 [58] 5 1 2 1 5 4 5 5 1 4 1 4 5 1 4 5 5 4 5 [77] 5 4 4 3 hist(as.numeric(Q5), density=30, main=strwrap(S5, width=60), axes=FALSE) axis(side=1, labels=c(Disagree, 2, Not Sure, 4, Strongly Agree), at=c(1, 2, 3, 4, 5)) axis(side=2) cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization challenge
Well, Albyn Jones gave a great solution to my challenge that found the best reading schedule. My original thought was that doing an exhaustive search would take too much time, but Albyn showed that there are ways to do it efficiently. My approach (as mentioned before) was to use optim with method SANN. I treated it like a balls and urns problem. Since I wanted to read 239 chapters in 128 days it would be like putting 239 balls in 128 urns. I started with 1 ball in each urn, then put the remain balls into urns to get a starting state (tried a couple different starting situations, 1 additional ball in each of the 1st urns, all the remaining balls in the first or last urn). Then my update step was just to take a single ball from one of the urns with 2 or more balls and move it to another urn at random. On tricky thing with this method is that moving one ball could change things quite a bit because one setup could have the longest chapters being read by themselves, but moving one ball would result in a long chapter now being grouped with others. My first update function just moved the ball to a random urn, then I tried moving the ball only one urn forward or backwards (this seemed to work better, but probably needed a longer run time). Finally the best method that I found chose the ball to move proportional to the lengths of the days reading and chose the urn to put it in with highest probability for days with the shortest readings. I thought my answers were pretty good (they looked reasonable), but Albyn's solution had half the variance as my best result. Below is the code that I used for my best results in case anyone is interested. I would also be interested if anyone could find a way to improve on what I did to get better results (help me learn SANN better, the arguments I used came mostly from trial and error). days - seq( as.Date('1/24/10',%m/%d/%y), as.Date('5/31/10','%m/%d/%y'), by=1) sq2.2 - rep(1, length(days)) sq2.2[ length(days) ] - nrow(bom3) -sum(sq2.2) + 1 genseq4 - function(sq) { w - rep(1:length(days), sq) tmp - tapply( bom3$Verses, w, sum ) ww - which(sq1) dwn - if (length(ww) 1) { sample( ww, 1, prob= tmp[ww] ) } else { ww } up - sample( seq_along(sq)[-dwn], 1, prob=max(tmp) - tmp[-dwn] ) sq[dwn] - sq[dwn] - 1 sq[up] - sq[up] + 1 sq } distance - function(sq) { w - rep(1:length(days), sq) tmp - tapply( bom3$Verses, w, sum ) var(tmp) } res - optim(sq2.2, distance, genseq4, method=SANN, control=list(maxit=3, temp=50, trace=TRUE )) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error on using lag function
Does this help: library(zoo) na.locf(P, fromLast=TRUE) You'll have to decide what to do if the last value is NA. -Peter Ehlers anna wrote: Hello everyone, I have a vector P and I want to replace each of its missing values by its next element, for example: P[i] = NA -- P[i] = P[i+1] To do this I am using the replace() and lag() functions like this: P - replace(as.ts(P),is.na(as.ts(P)),as.ts(lag(P,1))) but here is the error that I get: Warning message: In NextMethod([-) : number of items to replace is not a multiple of replacement length I have tried to reduce the dimension of P on the first two elements of the replace() function by one but it wouldn't work either. Any idea? - Anna Lippel -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hist - unevenly spaced bars
Well, your bars are not unevenly spaced; you just have some zero-count intervals. Time to learn about the str() function which will tell you what's going on. zh - hist(your_code) str(zh) zh$breaks zh$counts You could set breaks with hist(..., breaks=0:5 + .5) But a histogram doesn't seem like the right thing to do. Try barplot: barplot(table(Q5)) -Peter Ehlers Worik R wrote: I am sure this is trivial, but I cannot solve it. I make a histogram. There are 5 categories 1,...,5 and 80 values and the histogram does not evenly space the bars. Bars 1 and 2 have no space between them and the rest are evenly spaced. How can I get all bars evenly spaced? The code: Q5 [1] 4 4 4 5 2 4 5 3 4 5 3 4 3 5 2 4 5 5 4 [20] 3 1 4 5 5 4 3 1 5 4 3 5 3 3 5 5 5 5 4 [39] 4 5 1 1 5 4 4 4 1 4 4 5 5 2 4 5 4 3 4 [58] 5 1 2 1 5 4 5 5 1 4 1 4 5 1 4 5 5 4 5 [77] 5 4 4 3 hist(as.numeric(Q5), density=30, main=strwrap(S5, width=60), axes=FALSE) axis(side=1, labels=c(Disagree, 2, Not Sure, 4, Strongly Agree), at=c(1, 2, 3, 4, 5)) axis(side=2) cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Ehlers University of Calgary __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error on using lag function
Hi Peter, thank you for helping. The thing is don't want to it replace it with the last value but with the next value - Anna Lippel -- View this message in context: http://n4.nabble.com/Error-on-using-lag-function-tp1399935p1401319.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.