[R] Arrange Data
Hi, I have following data set and want to arrange as follows. structure(list(C1 = structure(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4), .Label = c("B", "C", "D", "E"), class = "factor"), C2 = c(34, 4, 54, 3, 23, 33, 2, 12, 33, 12, 10, 4)), .Names = c("C1", "C2" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12")) OUTPUT: B C E D 343212 4231210 544 Please let me know how can I can accomplish this in R. TIA Sachin [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction interval for new value
Berton, Thanks for your inupt. The 'nist' link you mentioned was one of the reasons for my confusion and how it is implemented in R. As for now I am assuming predict function with 'prediction' option will provide me tolerance/prediction interval. Is this a proper assumption? TIA for your help. Sachin Berton Gunter <[EMAIL PROTECTED]> wrote: Peter et. al.: > > With those definitions (which are hardly universal), tolerance > intervals are the same as prediction intervals with k == m == 1, which > is what R provides. > > I don't believe this is the case. See also: http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm This **is** fairly standard, I believe. For example, see the venerable classic text (INTRO TO MATH STAT) by Hogg and Craig. To be clear, since I may also be misinterpreting, what I understand/mean is: Peter's definition of a "tolerance/prediction interval" is a random interval that with a prespecified confidence contain a future predicted value; The definition I understand to be a random interval that with a prespecified confidence will contain a prespecfied proportion of the distribution of future values. ..e.g. a "95%/90%" tolerance interval will with 95% confidence contain 90% of future values (and one may well ask, "which 90%"?). Whether this is a useful idea is another issue: the parametric version is extremely sensitive (as one might imagine) to the assumption of exact normality; the nonparametric version relies on order statistics and is more robust. I believe it is nontrivial and perhaps ambiguous to extend the concept from the usual fixed distribution to the linear regression case. I seem to recall some papers on this, perhaps in JASA, in the past few years. As always, I welcome correction of any errors or misunderstandings herein. Cheers to all, Bert Gunter - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction interval for new value
Google search gave me this: http://ewr.cee.vt.edu/environmental/teach/smprimer/intervals/interval.html TIA Sachin Peter Dalgaard <[EMAIL PROTECTED]> wrote: Sachin J writes: > RUsers: > > Just confirming, does predict function with interval="prediction" > option gives prediction interval or tolerance interval?. Sorry for > reposting this question. Is there any definition of tolerance interval that is different from prediction interval? (Tolerance intervals in the medical sense means intervals that are designed to detect patients with abnormal levels of serum cholesterol (say).) -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction interval for new value
RUsers: Just confirming, does predict function with interval="prediction" option gives prediction interval or tolerance interval?. Sorry for reposting this question. Thanks in advance Sachin David Barron <[EMAIL PROTECTED]> wrote: Sorry, I think I may have mislead you; the documentation describes these rather ambiguously as "prediction (tolerance) intervals", but having done some comparisons with other software I believe they are what most of us call prediction intervals after all! On 15/09/06, Sachin J <[EMAIL PROTECTED]> wrote: If its true then how do I find prediction interval.? Thanx in advance. Sachin David Barron < [EMAIL PROTECTED]> wrote: I believe it is a tolerance interval On 15/09/06, Sachin J <[EMAIL PROTECTED]> wrote: David, Thanks for the quick reply. Just confirming, does predict(s.lm,data.frame(x=3),interval="prediction") gives prediction interval or tolerance interval? Thanks Sachin David Barron <[EMAIL PROTECTED] > wrote: > predict(s.lm,data.frame(x=3),interval="prediction") fit lwr upr [1,] 16073985 -9981352 42129323 > predict(s.lm,data.frame(x=3),interval="confidence") fit lwr upr [1,] 16073985 5978125 26169846 On 15/09/06, Sachin J <[EMAIL PROTECTED]> wrote: Hi, 1. How do I construct 95% prediction interval for new x values, for example - x = 3? 2. How do I construct 95% confidence interval? my dataframe is as follows : >dt structure(list(y = c(2610, 6050, 1620, 3070, 7010, 5770, 4670, 860, 1000, 6180, 3020, 5220, 7190, 5500, 1270 ), x = c(108000, 136000, 35000, 77000, 178000, 15, 126000, 24000, 28000, 214000, 108000, 19, 308000, 252000, 71000)), .Names = c("y", "x"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")) my regression eqn is as below: > s.lm <- lm(y ~ x) Thanks in advance. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP - -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP - -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction interval for new value
David, Thanks for the quick reply. Just confirming, does predict(s.lm,data.frame(x=3),interval="prediction") gives prediction interval or tolerance interval? Thanks Sachin David Barron <[EMAIL PROTECTED]> wrote: > predict(s.lm,data.frame(x=3),interval="prediction") fit lwr upr [1,] 16073985 -9981352 42129323 > predict(s.lm,data.frame(x=3),interval="confidence") fit lwr upr [1,] 16073985 5978125 26169846 On 15/09/06, Sachin J <[EMAIL PROTECTED]> wrote: Hi, 1. How do I construct 95% prediction interval for new x values, for example - x = 3? 2. How do I construct 95% confidence interval? my dataframe is as follows : >dt structure(list(y = c(2610, 6050, 1620, 3070, 7010, 5770, 4670, 860, 1000, 6180, 3020, 5220, 7190, 5500, 1270 ), x = c(108000, 136000, 35000, 77000, 178000, 15, 126000, 24000, 28000, 214000, 108000, 19, 308000, 252000, 71000)), .Names = c("y", "x"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")) my regression eqn is as below: > s.lm <- lm(y ~ x) Thanks in advance. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] prediction interval for new value
Hi, 1. How do I construct 95% prediction interval for new x values, for example - x = 3? 2. How do I construct 95% confidence interval? my dataframe is as follows : >dt structure(list(y = c(2610, 6050, 1620, 3070, 7010, 5770, 4670, 860, 1000, 6180, 3020, 5220, 7190, 5500, 1270 ), x = c(108000, 136000, 35000, 77000, 178000, 15, 126000, 24000, 28000, 214000, 108000, 19, 308000, 252000, 71000)), .Names = c("y", "x"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")) my regression eqn is as below: > s.lm <- lm(y ~ x) Thanks in advance. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quickie : unload library
try detach("package:zoo") Sachin Horace Tso <[EMAIL PROTECTED]> wrote: Sachin, I did try that, ex detach(zoo) Error in detach(zoo) : invalid name detach("zoo") Error in detach("zoo") : invalid name But zoo has been loaded, sessionInfo() Version 2.3.1 (2006-06-01) i386-pc-mingw32 attached base packages: [1] "methods" "datasets" "stats" "tcltk" "utils" "graphics" [7] "grDevices" "base" other attached packages: tseries quadprog zoo MASS Rpad "0.10-1" "1.4-8" "1.2-0" "7.2-27.1" "1.1.1" Thks, H. >>> Sachin J 8/25/2006 12:56 PM >>> see ?detach Horace Tso wrote: Dear list, I know it must be obvious and I did my homework. (In fact I've RSiteSearched with keyword "remove AND library" but got timed out.(why?)) How do I unload a library? I don't mean getting ride of it permanently but just to unload it for the time being. A related problem : I have some libraries loaded at startup in .First() which I have in .Rprofile. Now, I exited R and commented out the lines in .First(). Next time I launch R the same libraries are loaded again. I.e. there seems to be a memory of the old .First() somewhere which refuses to die. Thanks in adv. Horace __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quickie : unload library
see ?detach Horace Tso <[EMAIL PROTECTED]> wrote: Dear list, I know it must be obvious and I did my homework. (In fact I've RSiteSearched with keyword "remove AND library" but got timed out.(why?)) How do I unload a library? I don't mean getting ride of it permanently but just to unload it for the time being. A related problem : I have some libraries loaded at startup in .First() which I have in .Rprofile. Now, I exited R and commented out the lines in .First(). Next time I launch R the same libraries are loaded again. I.e. there seems to be a memory of the old .First() somewhere which refuses to die. Thanks in adv. Horace __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframe modification
Hi Gabor, Thanx for the help. I forgot to mention this. Column A is something like this A <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7,8,9,10,11,12) i.e it repeats. Rest all is same. How can I modify your solution to take care of this issue. Thanx in advance. Sachin Gabor Grothendieck <[EMAIL PROTECTED]> wrote: Here are two solutions: A <- 1:8 B <- c(1,2,4,7,8) C <- c(5,3,10,12,17) # solution 1 - assignment with subscripting DF <- data.frame(A, B = A, C = 0) DF[A %in% B, "C"] <- C # solution 2 - merge DF <- with(merge(data.frame(A), data.frame(B, C), by = 1, all = TRUE), data.frame(A, B = A, C = ifelse(is.na(C), 0, C))) On 8/21/06, Sachin J wrote: > Hi, > > How can I accomplish this in R. > > I have a Dataframe with 3 columns. Column B and C have same elements. But > column A has more elements than B and C. I want to compare Column A with B > and do the following: > > If A is not in B then insert a new row in B and C and fill these new rows with > B = A and C = 0. So finally I will have balanced dataframe with equal no of > rows (entries) in all the columns. > > For example: > > A[3] = 3 but is not in B. So insert new row and set B[3] = 3 (new row) and > C[3] = 0. Final result would look like: > > A B C > 1 1 5 > 2 2 3 > 3 3 0 > 4 4 10 > 5 5 0 > 6 6 0 > 7 7 12 > 8 8 17 > > These are the columns of DF > > a <- c(1,2,3,4,5,6,7,8) > > b <- c(1,2,4,7,8) > > c(5,3,10,12,17) > > Thanx in advance for the help. > > Sachin > > __ > > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dataframe modification
Hi, How can I accomplish this in R. I have a Dataframe with 3 columns. Column B and C have same elements. But column A has more elements than B and C. I want to compare Column A with B and do the following: If A is not in B then insert a new row in B and C and fill these new rows with B = A and C = 0. So finally I will have balanced dataframe with equal no of rows (entries) in all the columns. For example: A[3] = 3 but is not in B. So insert new row and set B[3] = 3 (new row) and C[3] = 0. Final result would look like: A B C 115 223 330 44 10 550 660 7712 8817 These are the columns of DF > a <- c(1,2,3,4,5,6,7,8) > b <- c(1,2,4,7,8) > c(5,3,10,12,17) Thanx in advance for the help. Sachin __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe of unequal rows
Bert, I tried readLines. It reads the data as is, but cant access individual columns. Still cant figure out how to accomplish this. An example would be of great help. PS: How do you indicate which fields are present in a record with less than the full number? - Via known delimiters for all fields. TIA Sachin Berton Gunter <[EMAIL PROTECTED]> wrote: How do you indicate which fields are present in a record with less than the full number? Via known delimiters for all fields? Via the order of values (fields are filled in order and only the last fields in a record can therefore be missing)? If the former, see the "sep" parameter in read.table() and friends. If the latter, one way is to open the file as a connection and use readLines()(you would check how many values were present and fill in the NA's as needed).There may be better ways, though. ?connections will get you started. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Sachin J > Sent: Friday, August 18, 2006 9:14 AM > To: R-help@stat.math.ethz.ch > Subject: [R] dataframe of unequal rows > > Hi, > > How can I read data of unequal number of observations > (rows) as is (i.e. without introducing NA for columns of less > observations than the maximum. Example: > > A B C D > 1 10 1 12 > 2 10 3 12 > 3 10 4 12 > 4 10 > 5 10 > > Thanks in advance. > > Sachin > > > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dataframe of unequal rows
Hi, How can I read data of unequal number of observations (rows) as is (i.e. without introducing NA for columns of less observations than the maximum. Example: AB C D 110 1 12 210 3 12 310 4 12 410 510 Thanks in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Insert rows - how can I accomplish this in R
Gabor, Thanks a lot for the help. The 1st method works fine. In 2nd method I am getting following error. > do.call(rbind, by(DF, cumsum(DF$A == 1), f)) Error in zoo(, time(as.ts(z)), z, fill = 0) : unused argument(s) (fill ...) Unable to figure out the cause. Thanks, Sachin Gabor Grothendieck <[EMAIL PROTECTED]> wrote: Here are two solutions. In both we break up DF into rows which start with 1. In solution #1 we create a new data frame with the required sequence for A and zeros for B and then we fill it in. In solution #2 we convert each set of rows to a zoo object z where column A is the times and B is the data. We convert that zoo object to a ts object (which has the effect of filling in the missing times) and then create a zoo object with no data from its times merging that zoo object with z using a fill of 0. Finally in both solutions we reconstruct the rows from that by rbind'ing everything together. # 1 f <- function(x) { DF <- data.frame(A = 1:max(x$A), B = 0) DF[x$A,"B"] <- x$B DF } do.call(rbind, by(DF, cumsum(DF$A == 1), f)) # 2 library(zoo) f <- function(x) { z <- zoo(x$B, x$A) ser <- merge(zoo(,time(as.ts(z)), z, fill = 0) data.frame(A = time(ser), B = coredata(ser)) } do.call(rbind, by(DF, cumsum(DF$A == 1), f) On 8/18/06, Sachin J wrote: > Hi, > > I have following dataframe. Column A indicates months. > > DF <- structure(list(A = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, > 2, 3, 4, 5, 7, 8, 11, 12, 1, 2, 3, 4, 5, 8), B = c(0, 0, 0, 8, > 0, 19, 5, 19, 0, 0, 0, 11, 0, 8, 5, 11, 19, 8, 11, 10, 0, 8, > 36, 10, 16, 10, 22)), .Names = c("A", "B"), class = "data.frame", row.names = > c("1", > "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", > "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", > "25", "26", "27")) > > There is some discontinuity in the data. For example month 6, 9,10 data (2nd > year) and month 6 data (3rd year) are absent. I want to insert the rows in > place of these missing months and set the corresponding B column to zero. > i.e., the result should look like: > > DFNEW <- structure(list(A = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, > 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8), > B = c(0, 0, 0, 8, 0, 19, 5, 19, 0, 0, 0, 11, 0, 8, 5, 11, > 19, 0, 8, 11, 0, 0, 10, 0, 8, 36, 10, 16, 10, 0, 0, 22)), .Names = c("A", > "B"), class = "data.frame", row.names = c("1", "2", "3", "4", > "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", > "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", > "27", "28", "29", "30", "31", "32")) > > Thanks in advance. > > Sachin > > > - > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Insert rows - how can I accomplish this in R
Hi, I have following dataframe. Column A indicates months. DF <- structure(list(A = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 7, 8, 11, 12, 1, 2, 3, 4, 5, 8), B = c(0, 0, 0, 8, 0, 19, 5, 19, 0, 0, 0, 11, 0, 8, 5, 11, 19, 8, 11, 10, 0, 8, 36, 10, 16, 10, 22)), .Names = c("A", "B"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27")) There is some discontinuity in the data. For example month 6, 9,10 data (2nd year) and month 6 data (3rd year) are absent. I want to insert the rows in place of these missing months and set the corresponding B column to zero. i.e., the result should look like: DFNEW <- structure(list(A = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8), B = c(0, 0, 0, 8, 0, 19, 5, 19, 0, 0, 0, 11, 0, 8, 5, 11, 19, 0, 8, 11, 0, 0, 10, 0, 8, 36, 10, 16, 10, 0, 0, 22)), .Names = c("A", "B"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32")) Thanks in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] arima() function - issues
Hi, My query is related to ARIMA function in stats package. While looking for the time series literature I found following link which highlights discrepancy in "arima" function while dealing with differenced time series. Is there a substitute function similar to "sarima" mentioned in the following website implemened in R? Any pointers would be of great help. http://lib.stat.cmu.edu/general/stoffer/tsa2/Rissues.htm Thanx in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AICc vs AIC for model selection
Hi Spencer, I did go through the previous postings in the mailing list. But couldn't find satisfactory answer to my question. I am dealing with univariate time series. I suspect that my data may contain some trend and seasonal components. Hence, rather than just fitting just AR(1) model, I am trying to find the right model which fits the data well and then use that model to forecast. In order to achieve this I am using best.arima model. If you have any other thoughts on this please let me know. Thanx in advance for your help. Regards Sachin Spencer Graves <[EMAIL PROTECTED]> wrote: Regarding AIC.c, have you tried RSiteSearch("AICc") and RSiteSearch("AIC.c")? This produced several comments that looked to me like they might help answer your question. Beyond that, I've never heard of the "forecast" package, and I got zero hits for RSiteSearch("best.arima"), so I can't comment directly on your question. Do you have only one series or multiple? If you have only one, I think it would be hard to justify more than a simple AR(1) model. Almost anything else would likely be overfitting. If you have multiple series, have you considered using 'lme' in the 'nlme' package? Are you familiar with Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer)? If not, I encourage you to spend some quality time with this book. My study of it has been amply rewarded, and I believe yours will likely also. Best Wishes, Spencer Graves Sachin J wrote: > Hi, > > I am using 'best.arima' function from forecast package to obtain point forecast for a time series data set. The documentation says it utilizes AIC value to select best ARIMA model. But in my case the sample size very small - 26 observations (demand data). Is it the right to use AIC value for model selection in this case. Should I use AICc instead of AIC. If so how can I modify best.arima function to change the selection creteria? Any pointers would be of great help. > > Thanx in advance. > > Sachin > > > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Recreate new dataframe based on condition
Hi, How can I achieve this in R. Dataset is as follows: >df x 1 2 2 4 3 1 4 3 5 3 6 2 structure(list(x = c(2, 4, 1, 3, 3, 2)), .Names = "x", row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame") I want to recreate a new data frame whose rows are sum of (1&2, 3&4, 5&6) of original df. For example >newdf x 1 6 2 4 3 5 Thanx in advance for the help. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] AICc vs AIC for model selection
Hi, I am using 'best.arima' function from forecast package to obtain point forecast for a time series data set. The documentation says it utilizes AIC value to select best ARIMA model. But in my case the sample size very small - 26 observations (demand data). Is it the right to use AIC value for model selection in this case. Should I use AICc instead of AIC. If so how can I modify best.arima function to change the selection creteria? Any pointers would be of great help. Thanx in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] AICc vs AIC for model selection
Hi, I am using 'best.arima' function from forecast package to obtain point forecast for a time series data set. The documentation says it utilizes AIC value to select best ARIMA model. But in my case the sample size very small - 26 observations (demand data). Is it the right to use AIC value for model selection in this case. Should I use AICc instead of AIC. If so how can I modify best.arima function to change the selection creteria? Any pointers would be of great help. Thanx in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] KPSS test
Hi Mark, Thanx for the help. I will verify my results with PP and DF test. Also as suggested I will take a look at the references pointed out. One small doubt: How do I decide what terms ( trend, constant, seasonality ) to include while using these stationarity tests. Any references would be of great help. Thanx, Sachin [EMAIL PROTECTED] wrote: >From: >Date: Thu Jul 06 14:17:25 CDT 2006 >To: Sachin J >Subject: Re: [R] KPSS test sachin : i think your interpretations are right given the data but kpss is quite a different test than the usual tests because it assumes that the null is stationarity while dickey fuller ( DF ) and phillips perron ( PP ) ) assume that the null is a unit root. therefore, you should check whetheer the conclusions you get from kpss are consistent with what you would get from DF or PP. the results often are not consistent. also, DF depends on what terms ( trend, constant ) you used in your estimation of the model. i'm not sure if kpss does also. people generally report Dickey fuller results but they are a little biased towards acepting unit root ( lower power ) so maybe that's why you are using KPSS ? Eric Zivot has a nice explanation of a lot of the of the stationarity tests in his S+Finmetrics book. testing for cyclical variation is pretty complex because that's basically the same as testing for seasonality. check ord's or ender's book for relatively simple ways of doing that. > >>From: Sachin J >>Date: Thu Jul 06 14:17:25 CDT 2006 >>To: R-help@stat.math.ethz.ch >>Subject: [R] KPSS test > >>Hi, >> >> Am I interpreting the results properly? Are my conclusions correct? >> >> > KPSS.test(df) >> >> KPSS test >> >> Null hypotheses: Level stationarity and stationarity around a linear trend. >> Alternative hypothesis: Unit root. >> >> Statistic for the null hypothesis of >> level stationarity: 1.089 >> Critical values: >> 0.10 0.05 0.025 0.01 >> 0.347 0.463 0.574 0.739 >> >> Statistic for the null hypothesis of >> trend stationarity: 0.13 >> Critical values: >> 0.10 0.05 0.025 0.01 >> 0.119 0.146 0.176 0.216 >> >> Lag truncation parameter: 1 >> >>CONCLUSION: Reject Ho at 0.05 sig level - Level Stationary >> Fail to reject Ho at 0.05 sig level - Trend Stationary >> >>> kpss.test(df,null = c("Trend")) >> KPSS Test for Trend Stationarity >> data: tsdata[, 6] >>KPSS Trend = 0.1298, Truncation lag parameter = 1, p-value = 0.07999 >> >> CONCLUSION: Fail to reject Ho - Trend Stationary as p-value < sig. level >> (0.05) >> >>> kpss.test(df,null = c("Level")) >> KPSS Test for Level Stationarity >> data: tsdata[, 6] >>KPSS Level = 1.0891, Truncation lag parameter = 1, p-value = 0.01 >> Warning message: >>p-value smaller than printed p-value in: kpss.test(tsdata[, 6], null = >>c("Level")) >> >> CONCLUSION: Reject Ho - Level Stationary as p-value > sig. level (0.05) >> >> Following is my data set >> >> structure(c(11.08, 7.08, 7.08, 6.08, 6.08, 6.08, 23.08, 32.08, >>8.08, 11.08, 6.08, 13.08, 13.83, 16.83, 19.83, 8.83, 20.83, 17.83, >>9.83, 20.83, 10.83, 12.83, 15.83, 11.83), .Tsp = c(2004, 2005.917, >>12), class = "ts") >> >> Also how do I test this time series for cyclical varitions? >> >> Thanks in advance. >> >> Sachin >> >> >>- >> >> [[alternative HTML version deleted]] >> >>__ >>R-help@stat.math.ethz.ch mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Access values in kpssstat-class
Hi, How can I access the Values stored in kpssstat-class given by KPSS.test function and store it in a variable. For example: >x <- rnorm(1000) >test <- KPSS.test(ts(x)) >test KPSS test Null hypotheses: Level stationarity and stationarity around a linear trend. Alternative hypothesis: Unit root. Statistic for the null hypothesis of level stationarity: 0.138 Critical values: 0.10 0.05 0.025 0.01 0.347 0.463 0.574 0.739 Statistic for the null hypothesis of trend stationarity: 0.038 Critical values: 0.10 0.05 0.025 0.01 0.119 0.146 0.176 0.216 Lag truncation parameter: 7 then store the test stat values in some variable say - result Thanx in advance. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] KPSS test
Hi, Am I interpreting the results properly? Are my conclusions correct? > KPSS.test(df) KPSS test Null hypotheses: Level stationarity and stationarity around a linear trend. Alternative hypothesis: Unit root. Statistic for the null hypothesis of level stationarity: 1.089 Critical values: 0.10 0.05 0.025 0.01 0.347 0.463 0.574 0.739 Statistic for the null hypothesis of trend stationarity: 0.13 Critical values: 0.10 0.05 0.025 0.01 0.119 0.146 0.176 0.216 Lag truncation parameter: 1 CONCLUSION: Reject Ho at 0.05 sig level - Level Stationary Fail to reject Ho at 0.05 sig level - Trend Stationary > kpss.test(df,null = c("Trend")) KPSS Test for Trend Stationarity data: tsdata[, 6] KPSS Trend = 0.1298, Truncation lag parameter = 1, p-value = 0.07999 CONCLUSION: Fail to reject Ho - Trend Stationary as p-value < sig. level (0.05) > kpss.test(df,null = c("Level")) KPSS Test for Level Stationarity data: tsdata[, 6] KPSS Level = 1.0891, Truncation lag parameter = 1, p-value = 0.01 Warning message: p-value smaller than printed p-value in: kpss.test(tsdata[, 6], null = c("Level")) CONCLUSION: Reject Ho - Level Stationary as p-value > sig. level (0.05) Following is my data set structure(c(11.08, 7.08, 7.08, 6.08, 6.08, 6.08, 23.08, 32.08, 8.08, 11.08, 6.08, 13.08, 13.83, 16.83, 19.83, 8.83, 20.83, 17.83, 9.83, 20.83, 10.83, 12.83, 15.83, 11.83), .Tsp = c(2004, 2005.917, 12), class = "ts") Also how do I test this time series for cyclical varitions? Thanks in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] KPSS test
Hi, Am I interpreting the results properly? Are my conclusions correct? > KPSS.test(df) KPSS test Null hypotheses: Level stationarity and stationarity around a linear trend. Alternative hypothesis: Unit root. Statistic for the null hypothesis of level stationarity: 1.089 Critical values: 0.10 0.05 0.025 0.01 0.347 0.463 0.574 0.739 Statistic for the null hypothesis of trend stationarity: 0.13 Critical values: 0.10 0.05 0.025 0.01 0.119 0.146 0.176 0.216 Lag truncation parameter: 1 CONCLUSION: Reject Ho at 0.05 sig level - Level Stationary Fail to reject Ho at 0.05 sig level - Trend Stationary > kpss.test(df,null = c("Trend")) KPSS Test for Trend Stationarity data: tsdata[, 6] KPSS Trend = 0.1298, Truncation lag parameter = 1, p-value = 0.07999 CONCLUSION: Fail to reject Ho - Trend Stationary as p-value < sig. level (0.05) > kpss.test(df,null = c("Level")) KPSS Test for Level Stationarity data: tsdata[, 6] KPSS Level = 1.0891, Truncation lag parameter = 1, p-value = 0.01 Warning message: p-value smaller than printed p-value in: kpss.test(tsdata[, 6], null = c("Level")) CONCLUSION: Reject Ho - Level Stationary as p-value > sig. level (0.05) Following is my data set structure(c(11.08, 7.08, 7.08, 6.08, 6.08, 6.08, 23.08, 32.08, 8.08, 11.08, 6.08, 13.08, 13.83, 16.83, 19.83, 8.83, 20.83, 17.83, 9.83, 20.83, 10.83, 12.83, 15.83, 11.83), .Tsp = c(2004, 2005.917, 12), class = "ts") Also how do I test this time series for cyclical varitions? Thanks in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Run-Sequence Plot
Hi, How can get Run-Sequence Plot and Autocorrelation plot (to visually test for stationarity of Time Series Data) in R. Thanks in advance. Sachin This is my df >df structure(list(V1= c(11.08, 7.08, 7.08, 6.08, 6.08, 6.08, 23.08, 32.08, 8.08, 11.08, 6.08, 13.08, 13.83, 16.83, 19.83, 8.83, 20.83, 17.83, 9.83, 20.83, 10.83, 12.83, 15.83, 11.83)), .Names = "V1", class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24" )) - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] write.table & csv help
Hi, How can I produce the following output in .csv format using write.table function. for(i in seq(1:2)) { df <- rnorm(4, mean=0, sd=1) write.table(df,"C:/output.csv", append = TRUE, quote = FALSE, sep = ",", row.names = FALSE, col.names = TRUE) } Current O/p: x0.287816-0.81803-0.15231-0.25849x2.26831 0.8631740.2699140.181486 Desired output x1 x20.287816 2.26831-0.81803 0.863174-0.15231 0.269914-0.25849 0.181486 Thanx in advance Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] converting to time series object : ts - package:stats
Hi Gabor, You are correct. The real problem is with read.csv. I am not sure why? My data looks V1,V2,V3 11.08,21.73,13.08 7.08,37.73,6.08 7.08,11.73,21.08 I never had this problem earlier. Anyway I did >df <- read.csv("Data.csv") >tsdata <- ts((df),frequency = 12, start = c(1999, 1)) it works fine. But still puzzled with read.csv behavior. Any thoughts? Thanx Gabor, Achim and Brian for your help. Sachin Gabor Grothendieck <[EMAIL PROTECTED]> wrote: df[] <- sapply(format(df), as.numeric) will convert it to numeric but I think the real problem is the read.csv statement. Do commas represent separators or decimals since you have specified comma for both? Assuming it looks like: A,B,C 1,2,3 4,5,6 just do: DF <- read.csv("Data.csv") str(DF) On 6/26/06, Sachin J wrote: > > > It seems I have problem in reading the data as dataframe. It is reading it > as factors. Here is the df > > df <- > read.csv("C:/Data.csv",header=TRUE,sep=",",na.strings="NA", > dec=",", strip.white=TRUE) > > > dput(df) > > > df <- structure(list(V1 = structure(c(2, 15, 15, 14, 14, 14, 12, 13, > + 16, 2, 14, 5, 6, 8, 10, 17, 11, 9, 18, 11, 1, 4, 7, 3), .Label = > c("10.83", > + "11.08", "11.83", "12.83", "13.08", "13.83", "15.83", "16.83", > + "17.83", "19.83", "20.83", "23.08", "32.08", "6.08", "7.08", > + "8.08", "8.83", "9.83"), class = "factor"), V2 = structure(c(8, > + 15, 2, 10, 9, 18, 1, 4, 10, 2, 8, 6, 17, 5, 16, 13, 5, 14, 3, > + 11, 3, 12, 7, 7), .Label = c("10.73", "11.73", "11.75", "12.73", > + "15.75", "19.73", "19.75", "21.73", "25.73", "26.73", "26.75", > + "27.75", "32.75", "33.75", "37.73", "42.75", "61.75", "9.73"), class = > "factor"), > + V3 = structure(c(3, 8, 7, 9, 11, 9, 3, 8, 10, 9, 11, 10, > + 2, 1, 12, 12, 6, 5, 4, 6, 2, 5, 5, 1), .Label = c("10.33", > + "12.33", "13.08", "13.33", "14.33", "15.33", "21.08", "6.08", > + "7.08", "8.08", "9.08", "9.33"), class = "factor")), .Names = c("V1", > + "V2", "V3"), class = "data.frame", row.names = c("1", "2", "3", > + "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", > + "16", "17", "18", "19", "20", "21", "22", "23", "24")) > > TIA > > Sachin > > > > > Gabor Grothendieck wrote: > > Sorry I meant issue dput(df) and > post > > df <- ...the output your got from dput(df)... > ...rest of your code... > > Now its reproducible. > > > On 6/26/06, Gabor Grothendieck wrote: > > We don't have data.csv so its still not ***reproducible*** by anyone > > else. To be reproducible it means that anyone can copy the code > > in your post, paste it into R and get the same answer. > > > > Suggest you post the output of > > dput(df) > > > > and then post > > dput <- ...the output you got from dput(df)... > > > > Now its reproducible. > > > > On 6/26/06, Sachin J wrote: > > > Hi Achim, > > > > > > I did the following: > > > > > > >df <- read.csv("C:/data.csv", header=TRUE,sep=",",na.strings="NA", > dec=",", strip.white=TRUE) > > > > > > Note: data.csv has 10 (V1...V10) columns. > > > > > > >df[1] > > > V1 > > > 1 11.08 > > > 2 7.08 > > > 3 7.08 > > > 4 6.08 > > > 5 6.08 > > > 6 6.08 > > > 7 23.08 > > > 8 32.08 > > > 9 8.08 > > > 10 11.08 > > > 11 6.08 > > > 12 13.08 > > > 13 13.83 > > > 14 16.83 > > > 15 19.83 > > > 16 8.83 > > > 17 20.83 > > > 18 17.83 > > > 19 9.83 > > > 20 20.83 > > > 21 10.83 > > > 22 12.83 > > > 23 15.83 > > > 24 11.83 > > > > > > >tsdata <- ts((df[1]),frequency = 12, start = c(2005, 1)) > > > > > > The resulting
Re: [R] converting to time series object : ts - package:stats
It seems I have problem in reading the data as dataframe. It is reading it as factors. Here is the df df <- read.csv("C:/Data.csv",header=TRUE,sep=",",na.strings="NA", dec=",", strip.white=TRUE) > dput(df) > df <- structure(list(V1 = structure(c(2, 15, 15, 14, 14, 14, 12, 13, + 16, 2, 14, 5, 6, 8, 10, 17, 11, 9, 18, 11, 1, 4, 7, 3), .Label = c("10.83", + "11.08", "11.83", "12.83", "13.08", "13.83", "15.83", "16.83", + "17.83", "19.83", "20.83", "23.08", "32.08", "6.08", "7.08", + "8.08", "8.83", "9.83"), class = "factor"), V2 = structure(c(8, + 15, 2, 10, 9, 18, 1, 4, 10, 2, 8, 6, 17, 5, 16, 13, 5, 14, 3, + 11, 3, 12, 7, 7), .Label = c("10.73", "11.73", "11.75", "12.73", + "15.75", "19.73", "19.75", "21.73", "25.73", "26.73", "26.75", + "27.75", "32.75", "33.75", "37.73", "42.75", "61.75", "9.73"), class = "factor"), + V3 = structure(c(3, 8, 7, 9, 11, 9, 3, 8, 10, 9, 11, 10, + 2, 1, 12, 12, 6, 5, 4, 6, 2, 5, 5, 1), .Label = c("10.33", + "12.33", "13.08", "13.33", "14.33", "15.33", "21.08", "6.08", + "7.08", "8.08", "9.08", "9.33"), class = "factor")), .Names = c("V1", + "V2", "V3"), class = "data.frame", row.names = c("1", "2", "3", + "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", + "16", "17", "18", "19", "20", "21", "22", "23", "24")) TIA Sachin Gabor Grothendieck <[EMAIL PROTECTED]> wrote: Sorry I meant issue dput(df) and post df <- ...the output your got from dput(df)... ...rest of your code... Now its reproducible. On 6/26/06, Gabor Grothendieck wrote: > We don't have data.csv so its still not ***reproducible*** by anyone > else. To be reproducible it means that anyone can copy the code > in your post, paste it into R and get the same answer. > > Suggest you post the output of > dput(df) > > and then post > dput <- ...the output you got from dput(df)... > > Now its reproducible. > > On 6/26/06, Sachin J wrote: > > Hi Achim, > > > > I did the following: > > > > >df <- read.csv("C:/data.csv", header=TRUE,sep=",",na.strings="NA", > > >dec=",", strip.white=TRUE) > > > > Note: data.csv has 10 (V1...V10) columns. > > > > >df[1] > > V1 > > 1 11.08 > > 2 7.08 > > 3 7.08 > > 4 6.08 > > 5 6.08 > > 6 6.08 > > 7 23.08 > > 8 32.08 > > 9 8.08 > > 10 11.08 > > 11 6.08 > > 12 13.08 > > 13 13.83 > > 14 16.83 > > 15 19.83 > > 16 8.83 > > 17 20.83 > > 18 17.83 > > 19 9.83 > > 20 20.83 > > 21 10.83 > > 22 12.83 > > 23 15.83 > > 24 11.83 > > > > >tsdata <- ts((df[1]),frequency = 12, start = c(2005, 1)) > > > > The resulting time series is different from the df. I don't know why? I > > think I am doing something silly. > > > > TIA > > > > Sachin > > > > > > Achim Zeileis wrote: > > On Mon, 26 Jun 2006, Sachin J wrote: > > > > > Hi, > > > > > > I am trying to convert a dataset (dataframe) into time series object > > > using ts function in stats package. My dataset is as follows: > > > > > > >df > > > [1] 11.08 7.08 7.08 6.08 6.08 6.08 23.08 32.08 8.08 11.08 6.08 13.08 > > > 13.83 16.83 19.83 8.83 20.83 17.83 > > > [19] 9.83 20.83 10.83 12.83 15.83 11.83 > > > > Please provide a reproducible example. You just showed us the print output > > for an object, claiming that it is an object of class "data.frame" which > > is rather unlikely given the print output. > > > > > I converted this into time series object as follows > > > > > > >tsdata <- ts((df),frequency = 12, start = c(1999, 1)) > > > > which produces the right result for me if `df' is a vector or a > > data.frame: > > > > df <- c(11.08, 7.08,
Re: [R] converting to time series object : ts - package:stats
You are right. The df is as follows: >df[1] V1 111.08 2 7.08 3 7.08 4 6.08 5 6.08 6 6.08 723.08 832.08 9 8.08 10 11.08 116.08 12 13.08 13 13.83 14 16.83 15 19.83 168.83 17 20.83 18 17.83 199.83 20 20.83 21 10.83 22 12.83 23 15.83 24 11.83 But when I provide df[,1] it prints as earlier in factor form. How do I take are of this (factor) issue. TIA Sachin Prof Brian Ripley <[EMAIL PROTECTED]> wrote: On Mon, 26 Jun 2006, Sachin J wrote: > I am trying to convert a dataset (dataframe) into time series object > using ts function in stats package. My dataset is as follows: > > >df > [1] 11.08 7.08 7.08 6.08 6.08 6.08 23.08 32.08 8.08 11.08 6.08 13.08 13.83 > 16.83 19.83 8.83 20.83 17.83 > [19] 9.83 20.83 10.83 12.83 15.83 11.83 No data frame will print like that, so it seems that your description and printout do not match. > I converted this into time series object as follows > > >tsdata <- ts((df),frequency = 12, start = c(1999, 1)) >From the help page for ts: data: a numeric vector or matrix of the observed time-series values. A data frame will be coerced to a numeric matrix via 'data.matrix'. I suspect you have a single-column data frame with a factor column. Look up what data.matrix does for factors. > The resulting time series is as follows: > > Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec > 1999 2 15 15 14 14 14 12 13 16 2 14 5 > 2000 6 8 10 17 11 9 18 11 1 4 7 3 > > I am unable to understand why the values of df and tsdata does not > match. I looked at ts function and I couldn't find any data > transformation. Am I missing something here? Any pointers would be of > great help. > > Thanks in advance. > > Sachin > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] converting to time series object : ts - package:stats
Hi Achim, I did the following: >df <- read.csv("C:/data.csv", header=TRUE,sep=",",na.strings="NA", dec=",", strip.white=TRUE) Note: data.csv has 10 (V1...V10) columns. >df[1] V1 111.08 2 7.08 3 7.08 4 6.08 5 6.08 6 6.08 723.08 832.08 9 8.08 10 11.08 116.08 12 13.08 13 13.83 14 16.83 15 19.83 168.83 17 20.83 18 17.83 199.83 20 20.83 21 10.83 22 12.83 23 15.83 24 11.83 >tsdata <- ts((df[1]),frequency = 12, start = c(2005, 1)) The resulting time series is different from the df. I don't know why? I think I am doing something silly. TIA Sachin Achim Zeileis <[EMAIL PROTECTED]> wrote: On Mon, 26 Jun 2006, Sachin J wrote: > Hi, > > I am trying to convert a dataset (dataframe) into time series object > using ts function in stats package. My dataset is as follows: > > >df > [1] 11.08 7.08 7.08 6.08 6.08 6.08 23.08 32.08 8.08 11.08 6.08 13.08 13.83 > 16.83 19.83 8.83 20.83 17.83 > [19] 9.83 20.83 10.83 12.83 15.83 11.83 Please provide a reproducible example. You just showed us the print output for an object, claiming that it is an object of class "data.frame" which is rather unlikely given the print output. > I converted this into time series object as follows > > >tsdata <- ts((df),frequency = 12, start = c(1999, 1)) which produces the right result for me if `df' is a vector or a data.frame: df <- c(11.08, 7.08, 7.08, 6.08, 6.08, 6.08, 23.08, 32.08, 8.08, 11.08, 6.08, 13.08, 13.83, 16.83, 19.83, 8.83, 20.83, 17.83, 9.83, 20.83, 10.83, 12.83, 15.83, 11.83) ts(df, frequency = 12, start = c(1999, 1)) ts(as.data.frame(df), frequency = 12, start = c(1999, 1)) > The resulting time series is as follows: > > Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec > 1999 2 15 15 14 14 14 12 13 16 2 14 5 > 2000 6 8 10 17 11 9 18 11 1 4 7 3 > > I am unable to understand why the values of df and tsdata does not match. So are we because you didn't really tell us enough about df... Best, Z > I looked at ts function and I couldn't find any data transformation. Am > I missing something here? Any pointers would be of great help. > > Thanks in advance. > > Sachin > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] converting to time series object : ts - package:stats
Hi, I am trying to convert a dataset (dataframe) into time series object using ts function in stats package. My dataset is as follows: >df [1] 11.08 7.08 7.08 6.08 6.08 6.08 23.08 32.08 8.08 11.08 6.08 13.08 13.83 16.83 19.83 8.83 20.83 17.83 [19] 9.83 20.83 10.83 12.83 15.83 11.83 I converted this into time series object as follows >tsdata <- ts((df),frequency = 12, start = c(1999, 1)) The resulting time series is as follows: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1999 2 15 15 14 14 14 12 13 16 2 14 5 2000 6 8 10 17 11 9 18 11 1 4 7 3 I am unable to understand why the values of df and tsdata does not match. I looked at ts function and I couldn't find any data transformation. Am I missing something here? Any pointers would be of great help. Thanks in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] conditional replacement
Thank you Gabor,Marc,Dimitrios and Sundar. Sachin Gabor Grothendieck <[EMAIL PROTECTED]> wrote: x <- 10*1:10 pmin(pmax(x, 30), 60) # 30 30 30 40 50 60 60 60 60 60 On 5/23/06, Sachin J wrote: > Hi > > How can do this in R. > > >df > > 48 > 1 > 35 > 32 > 80 > > If df < 30 then replace it with 30 and else if df > 60 replace it with 60. I > have a large dataset so I cant afford to identify indexes and then replace. > Desired o/p: > > 48 > 30 > 35 > 32 > 60 > > Thanx in advance. > > Sachin > __ > > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] conditional replacement
Hi How can do this in R. >df 48 1 35 32 80 If df < 30 then replace it with 30 and else if df > 60 replace it with 60. I have a large dataset so I cant afford to identify indexes and then replace. Desired o/p: 48 30 35 32 60 Thanx in advance. Sachin __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Distribution Identification/Significance testing
Hi, What are methods for identifying the right distribution for the dataset? As far as I know Fisher test (p > alpha) for stat. significance or min(square error) are two criteria for deciding. What are the other alternatives? - CONFIDENCE INTERVAL?. If any, how can I accomplish them in R. Thanx in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] writing 100 files
Try this: x <- 1:12 for (i in 1:2){ bb8 = sample(x) a <- sprintf("whatever%f.txt",i) write.table(bb8, quote = F, sep = '\t', row.names = F, col.names = F, file = a) } HTH Sachin Duncan Murdoch <[EMAIL PROTECTED]> wrote: On 5/22/2006 11:24 AM, Federico Calboli wrote: > Hi All, > > I need to write as text files 1000 ish variation of the same data frame, > once I permute a row. > > I would like to use the function write.table() to write the files, and > use a loop to do it: > > for (i in 1:1000){ > > bb8[2,] = sample(bb8[2,]) > write.table(bb8, quote = F, sep = '\t', row.names = F, col.names = F, > file = 'whatever?.txt') > } > so all the files are called whatever1: whatever1000 > > Any idea? Use the paste() function to construct the name, e.g. file = paste("whatever",i,".txt", sep="") Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] write.csv + appending output (FILE I/O)
Hi, How can I write the output to an excel (csv) file without printing row names (i.e without breaks). Here is my code: library( fn <- function() { q <- c(1,2,3) write.csv(q,"C:/Temp/op.xls", append = TRUE, row.names = FALSE,quote = FALSE) } # Function Call for(i in 1:3) { fn() } Present Output : x123x123x123 Desired output: 1 2 3 1 2 3 1 2 3 Also it displays following warning messages. Warning messages: 1: appending column names to file in: write.table(q, "C:/Temp/op.xls", 2: appending column names to file in: write.table(q, "C:/Temp/op.xls", 3: appending column names to file in: write.table(q, "C:/Temp/op.xls", I am using R2.2.1 windows version. I tried using write.xls from "marray" package but no success. Thanx in advance. Sachin __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] boxplot - labelling
Hi, How can I get the values of mean and median (not only points but values too) on the boxplot. I am using boxplot function from graphics package. Following is my data set > df [1] 5 1 1 0 0 10 38 47 2 5 0 28 5 8 81 21 12 9 1 12 2 4 22 3 > mean.val <- sapply(df,mean) > boxplot(df,las = 1,col = "light blue") > points(seq(df), mean.val, pch = 19) I could get mean as dot symbol but i need values too? Also how to print the x-axis labels vertically instead of horizontally? Is there any other function to achieve these? Thanx in advance. Sachin __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] using parnor (lmomco package) - output
Hi, I am using parnor function of lmomco package. I believe it provides mean and std. dev for the set of data. But the std. dev provided does not match with the actual std. dev of the data which is 247.9193 (using sd function). Am I missing something here? > lmr <- lmom.ub(c(123,34,4,654,37,78)) > parnor(lmr) $type [1] "nor" $para [1] 155. 210.2130 > sd(c(123,34,4,654,37,78)) [1] 247.9193 > mean(c(123,34,4,654,37,78)) [1] 155 > TIA Sachin __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Rcmdr problem - SciViews R
Hi, I am getting following error messages while using SciViews R. It displays a message saying: Package or Bundle Rcmdr was not found in C:\Software\R-22.1.1\Library would you like to install now?. However the Rcmdr package is there in the library. I reinstalled Rcmdr but still gives me same error message every time I try to use one of the GUI functions. Any pointers would be of great help. ERROR: Loading required package: datasets Loading required package: utils Loading required package: grDevices Loading required package: graphics Loading required package: stats Loading required package: methods Loading required package: tcltk Loading Tcl/Tk interface ... done Loading required package: R2HTML Loading required package: svMisc Loading required package: svIO Loading required package: svViews Loading required package: Rcmdr Loading required package: car Error in .Tcl.args.objv(...) : argument "default" is missing, with no default Error: .onLoad failed in 'loadNamespace' for 'Rcmdr' trying URL 'http://www.sciviews.org/SciViews-R/Rcmdr_1.1-2.zip' Content type 'application/zip' length 788628 bytes opened URL downloaded 770Kb package 'Rcmdr' successfully unpacked and MD5 sums checked Warning: cannot remove prior installation of package 'Rcmdr' The downloaded packages are in C:\Documents and Settings\Local settings\Temp\Rtmp2g5Kpb\downloaded_packages updating HTML package descriptions Loading required package: Rcmdr Error in .Tcl.args.objv(...) : argument "default" is missing, with no default Error: .onLoad failed in 'loadNamespace' for 'Rcmdr' Error in .Tcl.args.objv(...) : argument "default" is missing, with no default Error: .onLoad failed in 'loadNamespace' for 'Rcmdr' TIA. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error in rm.outlier method
Thank you Marc. That was of great help. There was some problem with the environment. I closed and reopened the workspace. Works fine now. Sachin "Marc Schwartz (via MN)" <[EMAIL PROTECTED]> wrote: Sachin, I don't have a definitive thought, but some possibilities might be a conflict somewhere in your environment with a local function or with one in the searchpath. Use ls() to review the current objects in your environment to see if something looks suspicious. It did not look like 'outliers' is using a namespace, so a conflict of some nature is a little more possible here. Also use searchpaths() to get a feel for where R is searching for the function. See what is getting searched "above" the outliers package in the search order, which might provide a clue. Also, try to start R from the command line using 'R --vanilla', which should give you a clean working environment. Then use library(outliers) and your code below to see if the same behavior is present. If so, perhaps there was a corruption in the package installation. If not, it would support some type of conflict or perhaps a corruption in your default working environment. HTH, Marc On Fri, 2006-04-28 at 11:57 -0700, Sachin J wrote: > Hi Marc: > > I am using rm.outlier() function from outliers package (reference: > CRAN package help). > You are right. I too couldn't find this error message in rm.outlier > function. Thats why I am unable to understand the cause of error. Any > further thoughts? I will take a look at the robust analytic methods as > suggested. > > Thanx > Sachin > > > "Marc Schwartz (via MN)" wrote: > On Fri, 2006-04-28 at 11:17 -0700, Sachin J wrote: > > Hi, > > > > I am trying to use rm.outlier method but encountering > following error: > > > > > y <- rnorm(100) > > > rm.outlier(y) > > > > Error: > > Error in if (nrow(x) != ncol(x)) stop("x must be a square > matrix") : > > argument is of length zero > > > > Whats wrong here? > > > > TIA > > Sachin > > It would be helpful to know which rm.outlier() function you > are using > and from which package it comes. > > The only one that I noted in a search is in the 'outliers' > CRAN package > and it can take a vector as the 'x' argument. > > The above square matrix test and resultant error message is > not in the > tarball R code for either outlier() or rm.outlier() in that > package, so > the source of the error is unclear. > > As an aside, you may wish to consider robust analytic methods > rather > than doing post hoc outlier removal. A search of the list > archives will > provide some insights here. RSiteSearch("outlier") will get > you there. > > HTH, > > Marc Schwartz > > > > > > > __ > save big. __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error in rm.outlier method
Hi Marc: I am using rm.outlier() function from outliers package (reference: CRAN package help). You are right. I too couldn't find this error message in rm.outlier function. Thats why I am unable to understand the cause of error. Any further thoughts? I will take a look at the robust analytic methods as suggested. Thanx Sachin "Marc Schwartz (via MN)" <[EMAIL PROTECTED]> wrote: On Fri, 2006-04-28 at 11:17 -0700, Sachin J wrote: > Hi, > > I am trying to use rm.outlier method but encountering following error: > > > y <- rnorm(100) > > rm.outlier(y) > > Error: > Error in if (nrow(x) != ncol(x)) stop("x must be a square matrix") : > argument is of length zero > > Whats wrong here? > > TIA > Sachin It would be helpful to know which rm.outlier() function you are using and from which package it comes. The only one that I noted in a search is in the 'outliers' CRAN package and it can take a vector as the 'x' argument. The above square matrix test and resultant error message is not in the tarball R code for either outlier() or rm.outlier() in that package, so the source of the error is unclear. As an aside, you may wish to consider robust analytic methods rather than doing post hoc outlier removal. A search of the list archives will provide some insights here. RSiteSearch("outlier") will get you there. HTH, Marc Schwartz - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Error in rm.outlier method
Hi, I am trying to use rm.outlier method but encountering following error: > y <- rnorm(100) > rm.outlier(y) Error: Error in if (nrow(x) != ncol(x)) stop("x must be a square matrix") : argument is of length zero Whats wrong here? TIA Sachin __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] cdf of weibull distribution
Hi, I have a data set which is assumed to follow weibull distr'. How can I find of cdf for this data. For example, for normal data I used (package - lmomco) >cdfnor(15,parnor(lmom.ub(c(df$V1 Also, lmomco package does not have functions for finding cdf for some of the distributions like lognormal. Is there any other package, which can handle these distributions? Thanx in advance Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Handling large dataset & dataframe
Mark: Thanx for the pointers. As suggested I will explore scan() method. Andy: How can I use colClasses in my case. I tried it unsuccessfully. Encountering following error. coltypes<- c("numeric","factor","numeric","numeric","numeric","numeric","factor", "numeric","numeric","factor","factor","numeric","numeric","numeric","n "numeric","numeric","numeric","numeric") mydf <- read.csv("C:/temp/data.csv", header=FALSE, colClasses = coltypes, strip.white=TRUE) ERROR: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'a real', got 'V1' Thank again. Sachin "Liaw, Andy" <[EMAIL PROTECTED]> wrote: Much easier to use colClasses in read.table, and in many cases just as fast (or even faster). Andy From: Mark Stephens > > From ?scan: "the *type* of what gives the type of data to be > read". So list(integer(), integer(), double(), raw(), ...) In > your code all columns are being read as character regardless > of the contents of the character vector. > > I have to admit that I have added the *'s in *type*. I have > been caught out by this too. Its not the most convenient way > to specify the types of a large number of columns either. As > you have a lot of columns you might want to do something like > this: as.list(rep(integer(1),250)), assuming your dummies > are together, to save typing. Also storage.mode() is useful > to tell you the precise type (and therefore size) of an > object e.g. sapply(coltypes, > storage.mode) is actually the types scan() will use. Note > that 'numeric' could be 'double' or 'integer' which are > important in your case to fit inside the 1GB limit, because > 'integer' (4 bytes) is half 'double' (8 bytes). > > Perhaps someone on r-devel could enhance the documentation to > make "type" stand out in capitals in bold in help(scan)? Or > maybe scan could be clever enough to accept a character > vector 'what'. Or maybe I'm missing a good reason why this > isn't possible - anyone? How about allowing a character > vector length one, with each character representing the type > of that column e.g. what="DDCD" would mean 4 integers > followed by 2 double's followed by a character column, > followed finally by a double column, 8 columns in total. > Probably someone somewhere has done that already, but I'm not > aware anyone has wrapped it up conveniently? > > On 25/04/06, Sachin J wrote: > > > > Mark: > > > > Here is the information I didn't provide in my earlier > post. R version > > is R2.2.1 running on Windows XP. My dataset has 16 variables with > > following data type. > > ColNumber: 1 2 3 ...16 > > Datatypes: > > > > > "numeric","numeric","numeric","numeric","numeric","numeric","character > > > ","numeric","numeric","character","character","numeric","numeric","num > > eric","numeric","numeric","numeric","numeric" > > > > Variable (2) which is numeric and variables denoted as > character are > > to be treated as dummy variables in the regression. > > > > Search in R help list suggested I can use read.csv with colClasses > > option also instead of using scan() and then converting it to > > dataframe as you suggested. I am trying both these methods > but unable > > to resolve syntactical error. > > > > >coltypes<- > > > c("numeric","factor","numeric","numeric","numeric","numeric","factor", > > > "numeric","numeric","factor","factor","numeric","numeric","numeric","n > > umeric","numeric","numeric","numeric") > > > > >mydf <- read.csv("C:/temp/data.csv", header=FALSE, colClasses = > > >coltypes, > > strip.white=TRUE) > > > > ERROR: Error in scan(file = file, what = what, sep = sep, quote = > > quote, dec = dec, : > > scan() expected 'a real', got 'V1' > > > > No idea whats the problem. > > > > AS PER YOUR SUGGESTION I TRIED scan() as follows: > > > > > > > >col
[R] NA in dummy regression coefficients
I'm running a regression model with dummy variables and getting NA for some coefficients. I believe this due to singularity problem. How can I exclude some of the dummy variables from the regression model in R to take care of this issue. I read in R help that lm() method takes care of this issue automatically. But in my case its not happening? Any pointers would be of great help. Regression Model: reg06 <- lm(mydf$y~ mydf$x1 + factor(mydf$x2) + factor(mydf$x3)+ factor(mydf$x4) + mydf$x5, singular.ok = TRUE) Thanx in advance Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Handling large dataset & dataframe
Mark: Here is the information I didn't provide in my earlier post. R version is R2.2.1 running on Windows XP. My dataset has 16 variables with following data type. ColNumber: 1 2 3 ...16 Datatypes: "numeric","numeric","numeric","numeric","numeric","numeric","character","numeric","numeric","character","character","numeric","numeric","numeric","numeric","numeric","numeric","numeric" Variable (2) which is numeric and variables denoted as character are to be treated as dummy variables in the regression. Search in R help list suggested I can use read.csv with colClasses option also instead of using scan() and then converting it to dataframe as you suggested. I am trying both these methods but unable to resolve syntactical error. >coltypes<- c("numeric","factor","numeric","numeric","numeric","numeric","factor","numeric","numeric","factor","factor","numeric","numeric","numeric","numeric","numeric","numeric","numeric") >mydf <- read.csv("C:/temp/data.csv", header=FALSE, colClasses = coltypes, strip.white=TRUE) ERROR: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'a real', got 'V1' No idea whats the problem. AS PER YOUR SUGGESTION I TRIED scan() as follows: >coltypes<-c("numeric","factor","numeric","numeric","numeric","numeric","factor","numeric","numeric","factor","factor","numeric","numeric","numeric","numeric","numeric","numeric","numeric") >x<-scan(file = "C:/temp/data.dbf",what=as.list(coltypes),sep=",",quiet=TRUE,skip=1) >names(x)<-scan(file = "C:/temp/data.dbf",what="",nlines=1, sep=",") >x<-as.data.frame(x) This is working fine but x has no data in it and contains > x [1] X._. NA.NA..1 NA..2 NA..3 NA..4 NA..5 NA..6 NA..7 NA..8 NA..9 NA..10 NA..11 [14] NA..12 NA..13 NA..14 NA..15 NA..16 <0 rows> (or 0-length row.names) Please let me know how to properly use scan or colClasses option. Sachin Mark Stephens <[EMAIL PROTECTED]> wrote: Sachin, With your dummies stored as integer, the size of your object would appear to be 35 * (4*250 + 8*16) bytes = 376MB. You said "PC" but did not provide R version information, assuming windows then ... With 1GB RAM you should be able to load a 376MB object into memory. If you can store the dummies as 'raw' then object size is only 126MB. You don't say how you attempted to load the data. Assuming your input data is in text file (or can be) have you tried scan()? Setup the 'what' argument with length 266 and make sure the dummy column are set to integer() or raw(). Then x = scan(...); class(x)=" data.frame". What is the result of memory.limit()? If it is 256MB or 512MB, then try starting R with --max-mem-size=800M (I forget the syntax exactly). Leave a bit of room below 1GB. Once the object is in memory R may need to copy it once, or a few times. You may need to close all other apps in memory, or send them to swap. I don't really see why your data should not fit into the memory you have. Purchasing an extra 1GB may help. Knowing the object size calculation (as above) should help you guage whether it is worth it. Have you used process monitor to see the memory growing as R loads the data? This can be useful. If all the above fails, then consider 64-bit and purchasing as much memory as you can afford. R can use over 64GB RAM+ on 64bit machines. Maybe you can hire some time on a 64-bit server farm - i heard its quite cheap but never tried it myself. You shouldn't need to go that far with this data set though. Hope this helps, Mark Hi Roger, I want to carry out regression analysis on this dataset. So I believe I can't read the dataset in chunks. Any other solution? TIA Sachin roger koenker < [EMAIL PROTECTED]> wrote: You can read chunks of it at a time and store it in sparse matrix form using the packages SparseM or Matrix, but then you need to think about what you want to do with it least squares sorts of things are ok, but other options are somewhat limited... url: www.econ.uiuc.edu/~roger Roger Koenker email [EMAIL PROTECTED] Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-
Re: [R] Handling large dataset & dataframe
Hi Andy: I searched through R-archive to find out how to handle large data set using readLines and other related R functions. I couldn't find any single post which elaborates the process. Can you provide me with an example or any pointers to the postings elaborating the process. Thanx in advance Sachin "Liaw, Andy" <[EMAIL PROTECTED]> wrote: Instead of reading the entire data in at once, you read a chunk at a time, and compute X'X and X'y on that chunk, and accumulate (i.e., add) them. There are examples in "S Programming", taken from independent replies by the two authors to a post on S-news, if I remember correctly. Andy From: Sachin J > > Gabor: > > Can you elaborate more. > > Thanx > Sachin > > Gabor Grothendieck wrote: > You just need the much smaller cross product matrix X'X and > vector X'Y so you can build those up as you read the data in > in chunks. > > > On 4/24/06, Sachin J wrote: > > Hi, > > > > I have a dataset consisting of 350,000 rows and 266 columns. Out of > > 266 columns 250 are dummy variable columns. I am trying to > read this > > data set into R dataframe object but unable to do it due to memory > > size limitations (object size created is too large to > handle in R). Is > > there a way to handle such a large dataset in R. > > > > My PC has 1GB of RAM, and 55 GB harddisk space running windows XP. > > > > Any pointers would be of great help. > > > > TIA > > Sachin > > > > > > - > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > -- -- - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Handling large dataset & dataframe
Gabor: Can you elaborate more. Thanx Sachin Gabor Grothendieck <[EMAIL PROTECTED]> wrote: You just need the much smaller cross product matrix X'X and vector X'Y so you can build those up as you read the data in in chunks. On 4/24/06, Sachin J wrote: > Hi, > > I have a dataset consisting of 350,000 rows and 266 columns. Out of 266 > columns 250 are dummy variable columns. I am trying to read this data set > into R dataframe object but unable to do it due to memory size limitations > (object size created is too large to handle in R). Is there a way to handle > such a large dataset in R. > > My PC has 1GB of RAM, and 55 GB harddisk space running windows XP. > > Any pointers would be of great help. > > TIA > Sachin > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Handling large dataset & dataframe
Hi Richard: Even if I dont read the dummy var columns, i.e. just read the original dataset with 350,000 rows and 16 columns, when I try to run the regression - using >lm(y ~ c1 + factor(c2) + factor(c3) ) ; where c2, c3 are dummy variables, The procedure fails saying not enough memory. But, > lm(y ~ c1 + factor(c2) ) works fine. Any thoughts. Thanks Sachin "Richard M. Heiberger" <[EMAIL PROTECTED]> wrote: Where is the excess size being identified? Is it the read? or in the lm(). If it is in the reading of the data, then why are you reading the dummy variables? Would it make sense to read a single column of a factor instead of 80 columns of dummy variables? - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Handling large dataset & dataframe
Hi Roger, I want to carry out regression analysis on this dataset. So I believe I can't read the dataset in chunks. Any other solution? TIA Sachin roger koenker <[EMAIL PROTECTED]> wrote: You can read chunks of it at a time and store it in sparse matrix form using the packages SparseM or Matrix, but then you need to think about what you want to do with it least squares sorts of things are ok, but other options are somewhat limited... url: www.econ.uiuc.edu/~roger Roger Koenker email [EMAIL PROTECTED] Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820 On Apr 24, 2006, at 12:41 PM, Sachin J wrote: > Hi, > > I have a dataset consisting of 350,000 rows and 266 columns. Out > of 266 columns 250 are dummy variable columns. I am trying to read > this data set into R dataframe object but unable to do it due to > memory size limitations (object size created is too large to handle > in R). Is there a way to handle such a large dataset in R. > > My PC has 1GB of RAM, and 55 GB harddisk space running windows XP. > > Any pointers would be of great help. > > TIA > Sachin > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting- > guide.html - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Handling large dataset & dataframe
Hi, I have a dataset consisting of 350,000 rows and 266 columns. Out of 266 columns 250 are dummy variable columns. I am trying to read this data set into R dataframe object but unable to do it due to memory size limitations (object size created is too large to handle in R). Is there a way to handle such a large dataset in R. My PC has 1GB of RAM, and 55 GB harddisk space running windows XP. Any pointers would be of great help. TIA Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Creat new column based on condition
Hi Gabor, The first one works fine. Just out of curiosity, in second solution: I dont want to create a matrix. I want to add a new column to the existing dataframe (i.e. V2 based on the values in V1). Is there a way to do it? TIA Sachin Gabor Grothendieck <[EMAIL PROTECTED]> wrote: Try: V1 <- matrix(c(10, 20, 30, 10, 10, 20), nc = 1) V2 <- 4 * (V1 == 10) + 6 * (V1 == 20) + 10 * (V1 == 30) or V2 <- matrix(c(4, 6, 10)[V1/10], nc = 1) On 4/21/06, Sachin J wrote: > Hi, > > How can I accomplish this task in R? > > V1 > 10 > 20 > 30 > 10 > 10 > 20 > > Create a new column V2 such that: > If V1 = 10 then V2 = 4 > If V1 = 20 then V2 = 6 > V1 = 30 then V2 = 10 > > So the O/P looks like this > > V1 V2 > 10 4 > 20 6 > 30 10 > 10 4 > 10 4 > 20 6 > > Thanks in advance. > > Sachin > > __ > > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Creat new column based on condition
Hi, How can I accomplish this task in R? V1 10 20 30 10 10 20 Create a new column V2 such that: If V1 = 10 then V2 = 4 If V1 = 20 then V2 = 6 V1 = 30 then V2 = 10 So the O/P looks like this V1 V2 10 4 20 6 30 10 10 4 10 4 20 6 Thanks in advance. Sachin __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Conditional Row Sum
Thanx Marc and Gabor for your help. Sachin "Marc Schwartz (via MN)" <[EMAIL PROTECTED]> wrote: On Thu, 2006-04-20 at 11:46 -0700, Sachin J wrote: > Hi, > > How can I accomplish this in R. Example: > > R1 R2 > 3 101 > 4 102 > 3 102 > 18 102 > 11 101 > > I want to find Sum(101) = 14 - i.e SUM(R1) where R2 = 101 > Sum(102) = 25 - SUM(R2) where R2 = 102 > > TIA > Sachin Presuming that your data is in a data frame called DF: > DF R1 R2 1 3 101 2 4 102 3 3 102 4 18 102 5 11 101 At least three options: > with(DF, tapply(R1, R2, sum)) 101 102 14 25 > aggregate(DF$R1, list(R2 = DF$R2), sum) R2 x 1 101 14 2 102 25 > by(DF$R1, DF$R2, sum) INDICES: 101 [1] 14 -- INDICES: 102 [1] 25 See ?by, ?aggregate and ?tapply and ?with. HTH, Marc Schwartz - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Conditional Row Sum
Hi, How can I accomplish this in R. Example: R1 R2 3 101 4 102 3 102 18102 11101 I want to find Sum(101) = 14 - i.e SUM(R1) where R2 = 101 Sum(102) = 25- SUM(R2) where R2 = 102 TIA Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Count Unique Rows/Values
x.unique$V1 gives the list of individual column's unique values. Thank you again Andy. Sachin "Liaw, Andy" <[EMAIL PROTECTED]> wrote: From:Sachin J > > Hi, > > This one is not working for me. It is listing all the rows > instead of unique ones. My dataset has 30 odd rows and > following is the resulting o/p > > [[308313]] > [1] 126 > [[308314]] > [1] 126 > [[308315]] > [1] 126 > [[308316]] > [1] 126 > [[308317]] > [1] 126 > [[308318]] > [1] 126 > [[308319]] > [1] 126 > [[308320]] > [1] 126 > [[308321]] > [1] 126 > > I used following set of commands. > > > (x.unique <- lapply(x$V1, unique)) You want "x" instead of "x$V1" as the first argument to lapply(), so that it runs unique() on all columns of "x". Andy > > sapply(x.unique, length) > > x$V1 is numeric field. > where x is my data frame already read (therefore i ignored > your first step). Am I missing something. ? > > Thanks > Sachin > > "Liaw, Andy" wrote: > This might help: > > > x <- read.table("clipboard", colClasses=c("numeric", "character")) > > (x.unique <- lapply(x, unique)) > $V1 > [1] 155 138 126 123 103 143 111 156 > > $V2 > [1] "A" "B" "C" "D" > > > sapply(x.unique, length) > V1 V2 > 8 4 > > Andy > > From: Sachin J > > > > Hi, > > > > I have a dataset which has both numeric and character > > values with dupllicates. For example: > > > > 155 A > > 138 A > > 138 B > > 126 C > > 126 D > > 123 A > > 103 A > > 103 B > > 143 D > > 111 C > > 111 D > > 156 C > > > > How can I count the number of unqiue entries without > > counting duplicate entries. Also can I extract the list in a > > object. What I mean is > > Col1 unique count = 8 Unique Elements are : > > 103,111,123,126,138,143,155,156 > > Col2 unique count = 4 Unique Elements are : A,B,C,D. > > > > Any pointers would be of great help. > > > > TIA > > Sachin > > > > > > > > - > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > -- > > > -- > > > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > -- -- - Celebrate Earth Day everyday! Discover 10 things you can do to help slow climate change. Yahoo! Earth Day [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Count Unique Rows/Values
"Liaw, Andy" <[EMAIL PROTECTED]> wrote: From:Sachin J > > Hi, > > This one is not working for me. It is listing all the rows > instead of unique ones. My dataset has 30 odd rows and > following is the resulting o/p > > [[308313]] > [1] 126 > [[308314]] > [1] 126 > [[308315]] > [1] 126 > [[308316]] > [1] 126 > [[308317]] > [1] 126 > [[308318]] > [1] 126 > [[308319]] > [1] 126 > [[308320]] > [1] 126 > [[308321]] > [1] 126 > > I used following set of commands. > > > (x.unique <- lapply(x$V1, unique)) You want "x" instead of "x$V1" as the first argument to lapply(), so that it runs unique() on all columns of "x". Andy > > sapply(x.unique, length) > > x$V1 is numeric field. > where x is my data frame already read (therefore i ignored > your first step). Am I missing something. ? > > Thanks > Sachin > > "Liaw, Andy" wrote: > This might help: > > > x <- read.table("clipboard", colClasses=c("numeric", "character")) > > (x.unique <- lapply(x, unique)) > $V1 > [1] 155 138 126 123 103 143 111 156 > > $V2 > [1] "A" "B" "C" "D" > > > sapply(x.unique, length) > V1 V2 > 8 4 > > Andy > > From: Sachin J > > > > Hi, > > > > I have a dataset which has both numeric and character > > values with dupllicates. For example: > > > > 155 A > > 138 A > > 138 B > > 126 C > > 126 D > > 123 A > > 103 A > > 103 B > > 143 D > > 111 C > > 111 D > > 156 C > > > > How can I count the number of unqiue entries without > > counting duplicate entries. Also can I extract the list in a > > object. What I mean is > > Col1 unique count = 8 Unique Elements are : > > 103,111,123,126,138,143,155,156 > > Col2 unique count = 4 Unique Elements are : A,B,C,D. > > > > Any pointers would be of great help. > > > > TIA > > Sachin > > > > > > > > - > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > -- > > > -- > > > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > -- -- - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Count Unique Rows/Values
But it is not giving me the list of unique elements. Count works fine. Sachin "Liaw, Andy" <[EMAIL PROTECTED]> wrote: From:Sachin J > > Hi, > > This one is not working for me. It is listing all the rows > instead of unique ones. My dataset has 30 odd rows and > following is the resulting o/p > > [[308313]] > [1] 126 > [[308314]] > [1] 126 > [[308315]] > [1] 126 > [[308316]] > [1] 126 > [[308317]] > [1] 126 > [[308318]] > [1] 126 > [[308319]] > [1] 126 > [[308320]] > [1] 126 > [[308321]] > [1] 126 > > I used following set of commands. > > > (x.unique <- lapply(x$V1, unique)) You want "x" instead of "x$V1" as the first argument to lapply(), so that it runs unique() on all columns of "x". Andy > > sapply(x.unique, length) > > x$V1 is numeric field. > where x is my data frame already read (therefore i ignored > your first step). Am I missing something. ? > > Thanks > Sachin > > "Liaw, Andy" wrote: > This might help: > > > x <- read.table("clipboard", colClasses=c("numeric", "character")) > > (x.unique <- lapply(x, unique)) > $V1 > [1] 155 138 126 123 103 143 111 156 > > $V2 > [1] "A" "B" "C" "D" > > > sapply(x.unique, length) > V1 V2 > 8 4 > > Andy > > From: Sachin J > > > > Hi, > > > > I have a dataset which has both numeric and character > > values with dupllicates. For example: > > > > 155 A > > 138 A > > 138 B > > 126 C > > 126 D > > 123 A > > 103 A > > 103 B > > 143 D > > 111 C > > 111 D > > 156 C > > > > How can I count the number of unqiue entries without > > counting duplicate entries. Also can I extract the list in a > > object. What I mean is > > Col1 unique count = 8 Unique Elements are : > > 103,111,123,126,138,143,155,156 > > Col2 unique count = 4 Unique Elements are : A,B,C,D. > > > > Any pointers would be of great help. > > > > TIA > > Sachin > > > > > > > > - > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > -- > > > -- > > > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > -- -- - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Count Unique Rows/Values
Thanks Andy. That works. Sachin "Liaw, Andy" <[EMAIL PROTECTED]> wrote: From:Sachin J > > Hi, > > This one is not working for me. It is listing all the rows > instead of unique ones. My dataset has 30 odd rows and > following is the resulting o/p > > [[308313]] > [1] 126 > [[308314]] > [1] 126 > [[308315]] > [1] 126 > [[308316]] > [1] 126 > [[308317]] > [1] 126 > [[308318]] > [1] 126 > [[308319]] > [1] 126 > [[308320]] > [1] 126 > [[308321]] > [1] 126 > > I used following set of commands. > > > (x.unique <- lapply(x$V1, unique)) You want "x" instead of "x$V1" as the first argument to lapply(), so that it runs unique() on all columns of "x". Andy > > sapply(x.unique, length) > > x$V1 is numeric field. > where x is my data frame already read (therefore i ignored > your first step). Am I missing something. ? > > Thanks > Sachin > > "Liaw, Andy" wrote: > This might help: > > > x <- read.table("clipboard", colClasses=c("numeric", "character")) > > (x.unique <- lapply(x, unique)) > $V1 > [1] 155 138 126 123 103 143 111 156 > > $V2 > [1] "A" "B" "C" "D" > > > sapply(x.unique, length) > V1 V2 > 8 4 > > Andy > > From: Sachin J > > > > Hi, > > > > I have a dataset which has both numeric and character > > values with dupllicates. For example: > > > > 155 A > > 138 A > > 138 B > > 126 C > > 126 D > > 123 A > > 103 A > > 103 B > > 143 D > > 111 C > > 111 D > > 156 C > > > > How can I count the number of unqiue entries without > > counting duplicate entries. Also can I extract the list in a > > object. What I mean is > > Col1 unique count = 8 Unique Elements are : > > 103,111,123,126,138,143,155,156 > > Col2 unique count = 4 Unique Elements are : A,B,C,D. > > > > Any pointers would be of great help. > > > > TIA > > Sachin > > > > > > > > - > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > -- > > > -- > > > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > -- -- - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Count Unique Rows/Values
Hi, This one is not working for me. It is listing all the rows instead of unique ones. My dataset has 30 odd rows and following is the resulting o/p [[308313]] [1] 126 [[308314]] [1] 126 [[308315]] [1] 126 [[308316]] [1] 126 [[308317]] [1] 126 [[308318]] [1] 126 [[308319]] [1] 126 [[308320]] [1] 126 [[308321]] [1] 126 I used following set of commands. > (x.unique <- lapply(x$V1, unique)) > sapply(x.unique, length) x$V1 is numeric field. where x is my data frame already read (therefore i ignored your first step). Am I missing something. ? Thanks Sachin "Liaw, Andy" <[EMAIL PROTECTED]> wrote: This might help: > x <- read.table("clipboard", colClasses=c("numeric", "character")) > (x.unique <- lapply(x, unique)) $V1 [1] 155 138 126 123 103 143 111 156 $V2 [1] "A" "B" "C" "D" > sapply(x.unique, length) V1 V2 8 4 Andy From: Sachin J > > Hi, > > I have a dataset which has both numeric and character > values with dupllicates. For example: > > 155 A > 138 A > 138 B > 126 C > 126 D > 123 A > 103 A > 103 B > 143 D > 111 C > 111 D > 156 C > > How can I count the number of unqiue entries without > counting duplicate entries. Also can I extract the list in a > object. What I mean is > Col1 unique count = 8 Unique Elements are : > 103,111,123,126,138,143,155,156 > Col2 unique count = 4 Unique Elements are : A,B,C,D. > > Any pointers would be of great help. > > TIA > Sachin > > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > -- -- - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Count Unique Rows/Values
Hi, I have a dataset which has both numeric and character values with dupllicates. For example: 155 A 138 A 138 B 126 C 126 D 123 A 103 A 103 B 143 D 111 C 111 D 156 C How can I count the number of unqiue entries without counting duplicate entries. Also can I extract the list in a object. What I mean is Col1 unique count = 8 Unique Elements are : 103,111,123,126,138,143,155,156 Col2 unique count = 4 Unique Elements are : A,B,C,D. Any pointers would be of great help. TIA Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nls - nonlinear regression
Hi, I am trying to run the following nonlinear regression model. > nreg <- nls(y ~ exp(-b*x), data = mydf, start = list(b = 0), alg = "default", trace = TRUE) OUTPUT: 24619327 : 0 24593178 : 0.0001166910 24555219 : 0.0005019005 24521810 : 0.001341571 24500774 : 0.002705402 24490713 : 0.004401078 24486658 : 0.00607728 24485115 : 0.007484372 24484526 : 0.008552635 24484298 : 0.009314779 24484208 : 0.009837009 24484172 : 0.01018542 24484158 : 0.01041381 24484152 : 0.01056181 24484150 : 0.01065700 24484149 : 0.01071794 24484148 : 0.01075683 24484148 : 0.01078161 24484148 : 0.01079736 24484148 : 0.01080738 24484148 : 0.01081374 Nonlinear regression modelmodel: y ~ exp(-b * x) data: mydfb 0.01081374residual sum-of-squares: 24484148 My question is how do I interpret the results of this model. > profile(nreg) 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : Error in prof$getProfile() : number of iterations exceeded maximum of 50 I am unable to understand the error cause. Any pointers would be of great help. Regards,Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Nonlinear Regression model: Diagnostics
Hi, I am trying to run the following nonlinear regression model. > nreg <- nls(y ~ exp(-b*x), data = mydf, start = list(b = 0), alg = "default", trace = TRUE) OUTPUT: 24619327 : 0 24593178 : 0.0001166910 24555219 : 0.0005019005 24521810 : 0.001341571 24500774 : 0.002705402 24490713 : 0.004401078 24486658 : 0.00607728 24485115 : 0.007484372 24484526 : 0.008552635 24484298 : 0.009314779 24484208 : 0.009837009 24484172 : 0.01018542 24484158 : 0.01041381 24484152 : 0.01056181 24484150 : 0.01065700 24484149 : 0.01071794 24484148 : 0.01075683 24484148 : 0.01078161 24484148 : 0.01079736 24484148 : 0.01080738 24484148 : 0.01081374 Nonlinear regression model model: y ~ exp(-b * x) data: mydf b 0.01081374 residual sum-of-squares: 24484148 My question is how do I interpret the results of this model. > profile(nreg) 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : 24484156 : Error in prof$getProfile() : number of iterations exceeded maximum of 50 I am unable to understand the error cause. Any pointers would be of great help. Regards, Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Subset dataframe based on condition
Thanx Steve and Tony for your help. Sachin Tony Plate <[EMAIL PROTECTED]> wrote: Works OK for me: > x <- data.frame(a=10^(-2:7), b=10^(10:1)) > subset(x, a > 1) a b 4 1e+01 1e+07 5 1e+02 1e+06 6 1e+03 1e+05 7 1e+04 1e+04 8 1e+05 1e+03 9 1e+06 1e+02 10 1e+07 1e+01 > subset(x, a > 1 & b < a) a b 8 1e+05 1000 9 1e+06 100 10 1e+07 10 > Do you get all "numeric" for the following? > sapply(x, class) a b "numeric" "numeric" > If not, then your data frame is probably encoding the information in some way that you don't want (though if it was as factors, I would have expected a warning from the comparison operator). You might get more help by distilling your problem to a simple example that can be tried out by others. -- Tony Plate Sachin J wrote: > Hi, > > I am trying to extract subset of data from my original data frame > based on some condition. For example : (mydf -original data frame, submydf > - subset dada frame) > > >submydf = subset(mydf, a > 1 & b <= a), > > here column a contains values ranging from 0.01 to 10. I want to > extract only those matching condition 1 i.e a > . But when i execute > this command it is not giving me appropriate result. The subset df - > submydf contains rows with 0.01 also. Please help me to resolve this > problem. > > Thanks in advance. > > Sachin > > > - > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Subset dataframe based on condition
Hi, I am trying to extract subset of data from my original data frame based on some condition. For example : (mydf -original data frame, submydf - subset dada frame) >submydf = subset(mydf, a > 1 & b <= a), here column a contains values ranging from 0.01 to 10. I want to extract only those matching condition 1 i.e a > . But when i execute this command it is not giving me appropriate result. The subset df - submydf contains rows with 0.01 also. Please help me to resolve this problem. Thanks in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Subset dataframe based on condition
Hi, I am trying to extract subset of data from my original data frame based on some condition. For example : (mydf -original data frame, submydf - subset dada frame) >submydf = subset(mydf, a > 1 & b <= a), here column a contains values ranging from 0.01 to 10. I want to extract only those matching condition 1 i.e a > . But when i execute this command it is not giving me appropriate result. The subset df - submydf contains rows with 0.01 also. Please help me to resolve this problem. Thanks in advance. Sachin - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html