[R] Readjusting frequencies
Dear Forum, I have following data.frame as fraud_data = data.frame(no_of_frauds = c(1, 2, 4, 6, 7, 9, 10), frequency = c(3, 1, 7, 11, 13, 1, 4)) fraud_data no_of_frauds frequency 1 1 3 2 2 1 3 4 7 4 6 11 5 7 13 6 9 1 7 10 4 I need to regroup the data in such a way that if the frequency is less than 5, the corresponding class data gets merged to next class i.e. the frequencies get added added till the added frequencies exceed 5. Thus, in above data.frame since frequencies pertaining to no_of_frauds 1 and 2 are 3 and 1 respectively, these get added to class 4 and the frequency of this class now becomes 3+1+7 = 11. Likewise, frequency of classes 9 and 10 are 1 and 4 and when these are added still it is 5 i.e. doesn't exceed 5. Thus, these should get added to the previous class i.e. 7. Thus I need to have no_of_frauds frequency 4 11 # ( 3 + 1 + 7) 6 11 7 18 # (13 + 1 + 4) Kindly guide Regards Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create Geotiff
Hi Karren, not sure if this is a problem of the software you are using to view the image after writing? I would first check the color scaling of the image in this software. I would interpret the black background as no data. regards, Ludwig Karren wrote Hi I am trying to export a raster as a Geotiff using - writeRaster(grazedmasstot, paste(pad, grazedmass_total.tif), GTiff, overwrite=TRUE) - But the resulting image is incorrect, the image is tiny and shows up as a white object with a black background. Does anyone have any suggestions how I can rectify this? Thanks - Dipl. Geogr. Ludwig Hilger Wiss. MA Lehrstuhl für Physische Geographie Katholische Universität Eichstätt-Ingolstadt Ostenstraße 18 85072 Eichstätt -- View this message in context: http://r.789695.n4.nabble.com/create-Geotiff-tp4680188p4680203.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date handling in R is hard to understand
Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Alemu Tadesse Sent: Friday, November 08, 2013 8:41 PM To: r-help@r-project.org Subject: [R] Date handling in R is hard to understand Dear All, I usually work with time series data. The data may come in AM/PM date format or on 24 hour time basis. R can not recognize the two differences automatically - at least for me. I have to specifically tell R in which time format the data is. It seems that Pandas knows how to handle date without being told the format. The problem arises when I try to shift time by a certain time. Say adding 3600 to shift it forward, that case I have to use something like: Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %H:%M)+3600 depending on the format. The date also attaches MDT or MST and so on. When merging two data frames with dates of different format that may create a problem (I think). When I get data from excel it could be in any/random format and I needed to customize the date to use in R in one of the above formats. Any TIPS - for automatic processing with no need to specifically tell the data format ? Another problem I saw was that when using r bind to bind data frames, if one column of one of the data frames is a character data (say for example none - coming from mysql) format R doesn't know how to concatenate numeric column from the other data frame to it. I needed to rbind/cbind can use data.frame method which add any column specific format. However with normal method, it results in matrix which has to have common type of data in all columns (actually matrix is only vector with dimensions). str(cbind(airquality, 1:153)) 'data.frame': 153 obs. of 7 variables: $ ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ solar.r: int 190 118 149 313 NA NA 299 99 19 194 ... $ wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ temp : int 67 72 74 62 56 66 65 59 61 69 ... $ month : int 5 5 5 5 5 5 5 5 5 5 ... $ day: int 1 2 3 4 5 6 7 8 9 10 ... $ 1:153 : int 1 2 3 4 5 6 7 8 9 10 ... Regards Petr change the numeric to character and later after binding takes place I had to re-convert it to numeric. But, this causes problem in an automated environment. Any suggestion ? Thanks Mihretu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Show time in x-axis
Hi, I am trying to show time( HH:MM:SS) in my x-axis. I have these two questions. 1. The error in the code is Error in axis(1, at = data$Time, labels = data$Time, las = 2, cex.axis = 1.2) : (list) object cannot be coerced to type 'double' Should I use 'POSIXCt' or 'strptime' ? 2. I have times that are repeated because it is the next or previous day. But I want to show the times and data points in sequence - as they are in the data frame - along the axis. Thanks, Mohan X1 X2 X3 X4 OldGenAfterFullGC X6 X7 PermGenAfterFullGC Time 10 3285873856 3456 3285 1256128 12862 1286219:36:16 2 3285 30437 873856 31324 30437 1256128 39212 3921219:36:26 3 312755 313565 873856 313843313565 1214080 182327 182327 20:36:27 4 313565 281379 873856 313789281379 1213248 182338 147729 21:36:29 50 3285873856 3456 3285 1256128 12862 1286219:36:16 plot(data$Time,levels(data$PermGenAfterFullGC)[data$PermGenAfterFullGC],col=darkblue,pch=2,type=b, ylab=Megabytes, xlab=Time,las=2,lwd=2, cex.lab=1,cex.axis=1,xaxt=n) axis(1, at = data$Time, labels = data$Time, las = 2,cex.axis=1.2) text(data$Time,data$Time, data$Time, 2, cex=1.45) This e-Mail may contain proprietary and confidential information and is sent for the intended recipient(s) only. If by an addressing or transmission error this mail has been misdirected to you, you are requested to delete this mail immediately. You are also hereby notified that any use, any form of reproduction, dissemination, copying, disclosure, modification, distribution and/or publication of this e-mail message, contents or its attachment other than by its intended recipient/s is strictly prohibited. Visit us at http://www.polarisFT.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MM robust
given the model y = x1 / 1+ b1x2^b2 given data y-c(2,3,4,5,6) x1- c(0.23,0.32,0.43,0.54,0.65) x2-c(0.11,021,0.31,0.41,0.33) initial parameter b1=0.023 b2=0.045 i am able to find the parameter of the above model usingnls method, can u please give hint on how i can solve the same model as above using MM robust estimate to obtain the parameter. i mean u can illustrate using the above information to enable me extend to what i am doing thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] package ‘build-essential’ is not available (for R version 3.0.2)
Hello, I have searched on the R-Project site, R-Help archives, and the Internet at large, and I cannot find a solution to my problem. I am running R version 3.0.2 (2013-09-25) -- Frisbee Sailing on Ubuntu 13.04. When I try to install several packages, including quantmod, with dependencies=T set, and I keep getting long lists of packages that result in installation of package 'X' had non-zero exit status. When I try to install X, I get another list of packages that failed to install. After a few iterations of this, the package that I am trying to install is listed among the packages that have failed to install. I found a reference online to build-essential, but when I tried to install that, I got package ‘build-essential’ is not available (for R version 3.0.2). Any hints or follow-up questions would be greatly appreciated. C.Evans __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Show time in x-axis
On 11/11/2013 09:07 PM, mohan.radhakrish...@polarisft.com wrote: Hi, I am trying to show time( HH:MM:SS) in my x-axis. I have these two questions. 1. The error in the code is Error in axis(1, at = data$Time, labels = data$Time, las = 2, cex.axis = 1.2) : (list) object cannot be coerced to type 'double' Should I use 'POSIXCt' or 'strptime' ? 2. I have times that are repeated because it is the next or previous day. But I want to show the times and data points in sequence - as they are in the data frame - along the axis. Thanks, Mohan X1 X2 X3 X4 OldGenAfterFullGC X6 X7 PermGenAfterFullGC Time 10 3285873856 3456 3285 1256128 12862 1286219:36:16 2 3285 30437 873856 31324 30437 1256128 39212 3921219:36:26 3 312755 313565 873856 313843313565 1214080 182327 182327 20:36:27 4 313565 281379 873856 313789281379 1213248 182338 147729 21:36:29 50 3285873856 3456 3285 1256128 12862 1286219:36:16 plot(data$Time,levels(data$PermGenAfterFullGC)[data$PermGenAfterFullGC],col=darkblue,pch=2,type=b, ylab=Megabytes, xlab=Time,las=2,lwd=2, cex.lab=1,cex.axis=1,xaxt=n) axis(1, at = data$Time, labels = data$Time, las = 2,cex.axis=1.2) text(data$Time,data$Time, data$Time, 2, cex=1.45) Hi Mohan, Yes, you probably want to convert the Time variable. However, to answer both questions in one, you also probably want to stick a starting date on your times, incrementing this whenever a time is less than the previous one: # this will produce times for the current date data$Time1-strptime(data$Time,%H:%M:%S) offset-0 lasttime-0 for(timedate in 1:length(data$Time1)) { if(as.numeric(data$Time1[timedate]) lasttime) offset-offset + 86400 data$Time1[timedate]-data$Time1[timedate]+offset lasttime-data$Time1[timedate] } Then you can use Time1 as the at argument, and Time as the labels argument to axis. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] repeating values in an index two by two
Hi All, I am trying to create an index that returns something like 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8 and so on and so forth until a predetermined value (which is obviously even). I am trying very hard to avoid for loops or for loops front ends. I'd be obliged if anybody could offer a suggestion. BW F signature.asc Description: Message signed with OpenPGP using GPGMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] graphics or table
hi I have this code for a cross validation: res - as.data.frame(CV_Pb_var)$residual sqrt(mean(res^2)) mean(res) mean(res^2/as.data.frame(CV_Pb_var)$var1.var) I can not seem to export everything in one table also can I to be exported it graphically? thanks enzo -- Enzo Cocca (PhD Candidate) Research Fellow Università di Napoli L'Orientale mail: enzo@gmail.com cell: +393495087014 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grnn input format usage?
I'm trying grnn package, and reproduced the example ( http://cran.r-project.org/web/packages/grnn/grnn.pdf), I tried the example with another x input column in the dataset (see below): but I'm getting the following error Error in Ya * patterns1 : non-conformable arrays, though I took care to pass an input of length 2 n - 100 set.seed(1) x1 - runif(n, -2, 2) x2 = x1^2 y0 - x1 * x2 epsilon - rnorm(n, 0, .1) y - y0 + epsilon grnn - learn(data.frame(y,x1, x2)) grnn - smooth(grnn,sigma=0.1) guess(grnn, c(2,4)) *Error in Ya * patterns1 : non-conformable arrays* guess(grnn, data.frame(x1=c(2), x2=c(4))) *Error in (X - Xa) %*% t(X - Xa) : * * requires numeric/complex matrix/vector arguments* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Apply a function with multiple argument on each column of matrix
Hi there !! I have a function like fun - function(x,y) { loe-loess(y ~ x,span=0.9,family=gaussian) pre-predict(loe,data.frame(x=x)) return(pre) } Now i have defined : x-1:500 y-matrix(rnorm(1000,3),ncol=2) I can manipulate fun(x,y[,1]) . But i want to apply the function on each column of matrix y . Any suggestion will be appreciated . Thanks . Best regards ... Tanvir Ahamed Göteborg, Sweden [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeating values in an index two by two
Hi. Here are two approaches: c(mapply(function(x,y) rep(c(x,y), 2), (1:10)[c(T,F)], (1:10)[c(F,T)])) c(tapply(1:10, rep(1:(10/2), each=2), rep, 2), recursive=T) Andrija On Mon, Nov 11, 2013 at 1:11 PM, Federico Calboli f.calb...@imperial.ac.ukwrote: Hi All, I am trying to create an index that returns something like 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8 and so on and so forth until a predetermined value (which is obviously even). I am trying very hard to avoid for loops or for loops front ends. I'd be obliged if anybody could offer a suggestion. BW F __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MM robust
given the model y = x1 / 1+ b1x2^b2 ... i am able to find the parameter of the above model usingnls method, can u please give hint on how i can solve the same model as above using MM robust estimate to obtain the parameter. i mean u can illustrate using the above information to enable me extend to what i am doing have a look at ?nlrob in the robustbase package. S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply a function with multiple argument on each column of matrix
On 11.11.2013 13:31, Mohammad Tanvir Ahamed wrote: Hi there !! I have a function like fun - function(x,y) { loe-loess(y ~ x,span=0.9,family=gaussian) pre-predict(loe,data.frame(x=x)) return(pre) } Now i have defined : x-1:500 y-matrix(rnorm(1000,3),ncol=2) I can manipulate fun(x,y[,1]) . But i want to apply the function on each column of matrix y . apply(y, 2, function(i) fun(x, i)) Uwe Ligges Any suggestion will be appreciated . Thanks . Best regards ... Tanvir Ahamed Göteborg, Sweden [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply a function with multiple argument on each column of matrix
Thanks !! Best regards ... Tanvir Ahamed Göteborg, Sweden On Monday, 11 November 2013, 13:49, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 11.11.2013 13:31, Mohammad Tanvir Ahamed wrote: Hi there !! I have a function like fun - function(x,y) { loe-loess(y ~ x,span=0.9,family=gaussian) pre-predict(loe,data.frame(x=x)) return(pre) } Now i have defined : x-1:500 y-matrix(rnorm(1000,3),ncol=2) I can manipulate fun(x,y[,1]) . But i want to apply the function on each column of matrix y . apply(y, 2, function(i) fun(x, i)) Uwe Ligges Any suggestion will be appreciated . Thanks . Best regards ... Tanvir Ahamed Göteborg, Sweden [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeating values in an index two by two
Hi, first off, thanks for the suggestion. I managed to solve it by doing: IND = rep(c(T,T,F,F), 5) X = rep(NA, 20) X[IND] = 1:10 X[!IND] = 1:10 which avoids any function -- I think mapply, apply etc call a for loop internally, which I'd rather avoid. BW F On 11 Nov 2013, at 12:35, andrija djurovic djandr...@gmail.com wrote: Hi. Here are two approaches: c(mapply(function(x,y) rep(c(x,y), 2), (1:10)[c(T,F)], (1:10)[c(F,T)])) c(tapply(1:10, rep(1:(10/2), each=2), rep, 2), recursive=T) Andrija On Mon, Nov 11, 2013 at 1:11 PM, Federico Calboli f.calb...@imperial.ac.uk wrote: Hi All, I am trying to create an index that returns something like 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8 and so on and so forth until a predetermined value (which is obviously even). I am trying very hard to avoid for loops or for loops front ends. I'd be obliged if anybody could offer a suggestion. BW F __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. signature.asc Description: Message signed with OpenPGP using GPGMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package ‘build-essential’ is not available (for R version 3.0.2)
Have you read these instructions? http://cran.r-project.org/bin/linux/ubuntu/README.html They say to run sudo apt-get install r-base-dev which should install 'build-essential' (which is an Ubuntu package, not an R package). -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Nov 11, 2013 at 5:02 AM, Charles Evans cev...@chyden.net wrote: Hello, I have searched on the R-Project site, R-Help archives, and the Internet at large, and I cannot find a solution to my problem. I am running R version 3.0.2 (2013-09-25) -- Frisbee Sailing on Ubuntu 13.04. When I try to install several packages, including quantmod, with dependencies=T set, and I keep getting long lists of packages that result in installation of package 'X' had non-zero exit status. When I try to install X, I get another list of packages that failed to install. After a few iterations of this, the package that I am trying to install is listed among the packages that have failed to install. I found a reference online to build-essential, but when I tried to install that, I got package ‘build-essential’ is not available (for R version 3.0.2). Any hints or follow-up questions would be greatly appreciated. C.Evans __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] r package to solve for Nash equilibrium
Is there an r package out there that solves for pure strategy* Nash equilibrium of a two-person game*? A search for Nash equilibrium in r provides a link to the *GNE* package which solves for the Generalized Nash equilibrium. But what I would like to solve is a pure strategy Nash equilibrium. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to introduce missing data for complete data
1. You need to define more explicitly exactly what you mean by randomly. 2. You need to make an honest effort to learn basic R, e.g. by spending time with the Introduction to R document that ships with R or an online tutorial (there are many good ones). Cheers, Bert On Sun, Nov 10, 2013 at 10:31 PM, dila radi dilarad...@gmail.com wrote: Hi, Im new R users. In my research I use rainfall data and Im interested in estimating missing data. I would like to use Normal Ratio Method to estimate missing data. My problem is, how do I introduce missing data randomly within my complete set of data? Stn ID Year Mth Day Amount 48603 71 1 1 1 48603 71 1 2 0.5 48603 71 1 3 1.3 48603 71 1 4 0.8 48603 71 1 5 0 48603 71 1 6 0 48603 71 1 7 0 ... Thank you so much for your attention and help. Regards, Dila [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeating values in an index two by two
f1 function(x) { one - matrix(1:x, nrow=2) as.vector(rbind(one, one)) } environment: 0x0daaf1c0 f1(8) [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8 Pat On 11/11/2013 12:11, Federico Calboli wrote: Hi All, I am trying to create an index that returns something like 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8 and so on and so forth until a predetermined value (which is obviously even). I am trying very hard to avoid for loops or for loops front ends. I'd be obliged if anybody could offer a suggestion. BW F __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @burnsstat @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of: 'Impatient R' 'The R Inferno' 'Tao Te Programming') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cross Tabulation
OK. Then using aggregate(): data$yes - ifelse(data$response==yes, 1, 0) data$no - ifelse(data$response==no, 1, 0) dataresp - aggregate(cbind(no, yes)~region+district, data, sum) dataresp[,3:4] - dataresp[,3:4]/rowSums(dataresp[,3:4]) # or dataresp[,3:4] - prop.table(as.matrix(dataresp[,3:4]), 1) dataresp region district no yes 1 Ad 0.5 0.5 2 Ae 0.0 1.0 3 Bf 0.5 0.5 4 Bg 0.5 0.5 5 Ch 0.5 0.5 6 Ci 0.0 1.0 7 Cj 1.0 0.0 David From: Peter Maclean [mailto:pmaclean2...@yahoo.com] Sent: Sunday, November 10, 2013 12:52 PM To: dcarl...@tamu.edu Subject: Re: [R] Cross Tabulation Thanks. But I am creating lots of tables and I need Regions and Districts to appear so as to avoid to much editing. Peter Maclean Department of Economics UDSM On Sunday, November 10, 2013 12:32 PM, David Carlson dcarl...@tamu.edu wrote: The simplest would be to create a variable combining region and district: data$region_district - with(data, paste(region, district)) prop.table(xtabs(~region_district+response, data), 1) response region_district no yes A d 0.5 0.5 A e 0.0 1.0 B f 0.5 0.5 B g 0.5 0.5 C h 0.5 0.5 C i 0.0 1.0 C j 1.0 0.0 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Maclean Sent: Sunday, November 10, 2013 12:06 AM To: r-help@r-project.org Subject: Re: [R] Cross Tabulation #Would like to create a cross-table (Region, district, response) and #(Region, district, cost. The flat table function does not look so good region - c(A,A,A,A,B,B, B, B, C,C, C, C) district - c(d,d,e,e,f,f, g, g, h,h, i, j) response - c(yes, no, yes, yes, no, yes, yes, no, yes, no, yes,no) cost - runif(12, 5.0, 9) var - c(region, response, district) data - data.frame(region, district, response, cost) var1 - c(region, district, response) var2 - c(region, district, cost) data1 - data[var1] #This look okay with(data, aggregate(x=cost, by=list(region, district), FUN=mean)) #This does not look good #How do i remove the NaN or create a better one prop.table(ftable(data1, exclude = c(NA, NaN)), 1) prop.table(ftable(xtabs(~region + district+ response, data=data)),1) Peter Maclean Department of Economics UDSM [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeating values in an index two by two
Here's a rather extreme solution: foo-rep(1:6,each=2) Rgames foo [1] 1 1 2 2 3 3 4 4 5 5 6 6 Rgames foo[rep(c(1,3,2,4),3)+rep(c(0,4,8),each=4)] [1] 1 2 1 2 3 4 3 4 5 6 5 6 In the general case, then, it would be something like foo- rep(1:N, each = 2) # foo is of length(2*N) foo[rep(c(1,3,2,4),2*N/4 + rep( seq(0, 3*N/4,by=4),each=4)] Note that the refolding requires the sequence to have length a multiple of 4. Patrick Burns wrote f1 function(x) { one - matrix(1:x, nrow=2) as.vector(rbind(one, one)) } environment: 0x0daaf1c0 f1(8) [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8 Pat On 11/11/2013 12:11, Federico Calboli wrote: Hi All, I am trying to create an index that returns something like 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8 and so on and so forth until a predetermined value (which is obviously even). I am trying very hard to avoid for loops or for loops front ends. I'd be obliged if anybody could offer a suggestion. BW F __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pburns@.seanet twitter: @burnsstat @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of: 'Impatient R' 'The R Inferno' 'Tao Te Programming') __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/repeating-values-in-an-index-two-by-two-tp4680210p4680234.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeating values in an index two by two
Here is another solution that is a bit more flexible tmp - seq(8) # split into your desired groups max.groups - 2 tmp.g - split(tmp, ceiling(seq_along(tmp)/max.groups)) # do repeats, unlist, numeric index as.numeric(unlist(rep(tmp.g, each = 2))) Hope this works for you, Charles On Mon, Nov 11, 2013 at 10:16 AM, Carl Witthoft c...@witthoft.com wrote: Here's a rather extreme solution: foo-rep(1:6,each=2) Rgames foo [1] 1 1 2 2 3 3 4 4 5 5 6 6 Rgames foo[rep(c(1,3,2,4),3)+rep(c(0,4,8),each=4)] [1] 1 2 1 2 3 4 3 4 5 6 5 6 In the general case, then, it would be something like foo- rep(1:N, each = 2) # foo is of length(2*N) foo[rep(c(1,3,2,4),2*N/4 + rep( seq(0, 3*N/4,by=4),each=4)] Note that the refolding requires the sequence to have length a multiple of 4. Patrick Burns wrote f1 function(x) { one - matrix(1:x, nrow=2) as.vector(rbind(one, one)) } environment: 0x0daaf1c0 f1(8) [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8 Pat On 11/11/2013 12:11, Federico Calboli wrote: Hi All, I am trying to create an index that returns something like 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8 and so on and so forth until a predetermined value (which is obviously even). I am trying very hard to avoid for loops or for loops front ends. I'd be obliged if anybody could offer a suggestion. BW F __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pburns@.seanet twitter: @burnsstat @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of: 'Impatient R' 'The R Inferno' 'Tao Te Programming') __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/repeating-values-in-an-index-two-by-two-tp4680210p4680234.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Charles Determan Integrated Biosciences PhD Candidate University of Minnesota [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeating values in an index two by two
n-7 rep(seq(1,n,2), each=4)+c(0,1,0,1) [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8 rep(), seq(), rbind(), apply() ... whatever: internally there will always be iteration via some loop :-) Ia. On Mon, Nov 11, 2013 at 11:16 AM, Carl Witthoft c...@witthoft.com wrote: Here's a rather extreme solution: foo-rep(1:6,each=2) Rgames foo [1] 1 1 2 2 3 3 4 4 5 5 6 6 Rgames foo[rep(c(1,3,2,4),3)+rep(c(0,4,8),each=4)] [1] 1 2 1 2 3 4 3 4 5 6 5 6 In the general case, then, it would be something like foo- rep(1:N, each = 2) # foo is of length(2*N) foo[rep(c(1,3,2,4),2*N/4 + rep( seq(0, 3*N/4,by=4),each=4)] Note that the refolding requires the sequence to have length a multiple of 4. Patrick Burns wrote f1 function(x) { one - matrix(1:x, nrow=2) as.vector(rbind(one, one)) } environment: 0x0daaf1c0 f1(8) [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8 Pat On 11/11/2013 12:11, Federico Calboli wrote: Hi All, I am trying to create an index that returns something like 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8 and so on and so forth until a predetermined value (which is obviously even). I am trying very hard to avoid for loops or for loops front ends. I'd be obliged if anybody could offer a suggestion. BW F __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pburns@.seanet twitter: @burnsstat @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of: 'Impatient R' 'The R Inferno' 'Tao Te Programming') __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/repeating-values-in-an-index-two-by-two-tp4680210p4680234.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeating values in an index two by two
Or you can use the integer divide and remainder operators: n - 30 x - seq(0, len=n) + (x %% 2) + (x %/% 4)*2 + 1 # period 2 oscillator + jump by 2 every fourth [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 [16] 8 9 10 9 10 11 12 11 12 13 14 13 14 15 16 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Iakub Henschen Sent: Monday, November 11, 2013 8:42 AM To: r-help@r-project.org Subject: Re: [R] repeating values in an index two by two n-7 rep(seq(1,n,2), each=4)+c(0,1,0,1) [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8 rep(), seq(), rbind(), apply() ... whatever: internally there will always be iteration via some loop :-) Ia. On Mon, Nov 11, 2013 at 11:16 AM, Carl Witthoft c...@witthoft.com wrote: Here's a rather extreme solution: foo-rep(1:6,each=2) Rgames foo [1] 1 1 2 2 3 3 4 4 5 5 6 6 Rgames foo[rep(c(1,3,2,4),3)+rep(c(0,4,8),each=4)] [1] 1 2 1 2 3 4 3 4 5 6 5 6 In the general case, then, it would be something like foo- rep(1:N, each = 2) # foo is of length(2*N) foo[rep(c(1,3,2,4),2*N/4 + rep( seq(0, 3*N/4,by=4),each=4)] Note that the refolding requires the sequence to have length a multiple of 4. Patrick Burns wrote f1 function(x) { one - matrix(1:x, nrow=2) as.vector(rbind(one, one)) } environment: 0x0daaf1c0 f1(8) [1] 1 2 1 2 3 4 3 4 5 6 5 6 7 8 7 8 Pat On 11/11/2013 12:11, Federico Calboli wrote: Hi All, I am trying to create an index that returns something like 1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8 and so on and so forth until a predetermined value (which is obviously even). I am trying very hard to avoid for loops or for loops front ends. I'd be obliged if anybody could offer a suggestion. BW F __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pburns@.seanet twitter: @burnsstat @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of: 'Impatient R' 'The R Inferno' 'Tao Te Programming') __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/repeating-values-in-an-index-two-by-two- tp4680210p4680234.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphics or table
Your code is messed up because you posted in HTML. Also, it is not reproducible (e.g. no sample data, incomplete analysis code). (See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for more on reproducibility.) Also, this looks very much like homework and the Posting Guide (mentioned in the footer of this email) indicates that homework is off topic here, and that you should rely on resources provided by your educational institution. If you want to put all of these various values in one table, you will need to write code to do so, since you have to specify how you want that table laid out. E.g. resultdf - data.frame( Mean=mean(res), StdDev=SD(res)) but mixing single valued measures such as mean with vector valued measures such as residuals in a single table usually requires repeating the SVM in many rows, which is why this often is avoided. Your unnecessary and inappropriate use of as.data.frame also suggests that you need to spend some time studying the Introduction to R document that comes with the software learning the difference between vectors, lists and data frames. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Enzo Cocca enzo@gmail.com wrote: hi I have this code for a cross validation: res - as.data.frame(CV_Pb_var)$residual sqrt(mean(res^2)) mean(res) mean(res^2/as.data.frame(CV_Pb_var)$var1.var) I can not seem to export everything in one table also can I to be exported it graphically? thanks enzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generating bootstrap samples from a panel data frame
With a data frame (call it *d*) composed of 2000 individuals and *n*observations for each individual (thus *2000n* observations in total), I would like to generate *k* bootstrap samples with replacement from *d*. Amongst other variables, *d* has a numeric variable *id* taking on identical value for observations belonging to the same individual. Taking into consideration the panel nature of the data, I want to generate many bootstrap samples with replacement and store each bootstrap sample data frame for further use. Sampling (or selection into the bootstrap sample) shall be based on individuals (on unique values of *id*) such that if an individual is in a particular bootstrap sample, so will all observations belonging to that individual. How can I do this in r? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bar Graph
I am using R 3.0.2 on a 64 bit machine I have a data set from 1989-2002. The data has four variables serialno, date, admission ward, temperature and bcg scar. serialno admin_ward date_admn bcg_scar temp_axilla yr 70162Ward2 11-Oct-89 y 38.9 1989 70163 Ward111-Oct-91 y 37.2 1991 70164 Ward2 11-Oct-92n 37.3 1992 70165 Ward111-Oct-93y38.9 1993 70166 Ward1 11-Oct-94 y 37.7 1994 70167 Ward1 11-Oct-95 y 40 1995 I want to do a bar graph of total data (serialno) vs *(data of one of the variables) to show the available data vs total data over the years i am using gplot(dta, aes(temp_axilla, fill=admin_ward)) + geom_bar() + facet_grid(. ~ yr, scales = free,margins=F) + geom_histogram(binwidth=300) But can include the serialno which shows the data. how can I achieve this -- Mega Six Solutions Web Designer and Research Consultant Kennedy Mwai __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Dear Mr/Mrs. I am Viarti Eminita, student from magister fifth level of Statistics in Bogor Agriculture University. Mr/ Mrs, now I'm analyzing ANN on time series data, I am learning kohonen package for series data, but when I want to predict, the predict value still on pattern scale. I wanna ask how to change the predict value to real data value? example: data - read.table(D:/THESIS/Data/data.txt,head=T) Ytraining - scale(data[1:168,3]) Xtraining - scale(data[1:168,4:6]) Xtest - scale(data[168:180,4:6]) xyf - xyf(Xtraining,Ytraining,grid = somgrid(5, 5, hexagonal)) xyf.prediction - predict(xyf,newdata=Xtest) thank's Mr/Mrs. best regard, viarti [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Earth (MARS) package with categorical predictors
Steve, thanks for your reply. Here is what I get. pkg is a 4-level categorical vector. is.factor(pkg) [1] TRUE summary(pkg) BGA PGA QCC QFP 225 36 19 178 dat - earth(lifetime ~ pkg+pins+volts+temp+doi+logspd, degree=3) ## The other vars are continuous. s - 243 pr - c(pkg[s],pins[s],volts[s],temp[s],doi[s],logspd[s]) pkg[s] [1] BGA Levels: BGA PGA QCC QFP pr [1]1.00 256.003.30 125.00 2002.2581054.890349 pred - predict(dat, newdata=pr) Error : variable 'pkg' was fitted with type factor but type numeric was supplied Forging on regardless, first few rows of x are pkg pins volts temp doi logspd 1 1 256 3.3 125 2002.258 4.890349 Error: get.earth.x from model.matrix.earth from predict.earth: the number 6 of columns of x (after factor expansion) does not match the number 8 of columns of the earth object expanded x: pkg pins volts temp doi logspd object$dirs: pkgPGA pkgQCC pkgQFP pins volts temp doi logspd Possible remedy: check factors in the input data Pkg is being passed as numeric 1. I'm unsure how to correctly specify pkg for predict. In the example you gave, does the data include a categorical? Chris -Original Message- From: Stephen Milborrow [mailto:mi...@sonic.net] Sent: Monday, November 11, 2013 7:21 AM To: kins...@verizon.net Subject: [R] Earth (MARS) package with categorical predictors See if you can provide a simple reproducible example. It's not clear exactly what the issue is from your question. The following simple example gives the correct response: data(etitanic) a - earth(survived~., data=etitanic) predict(a, newdata=etitanic[1,]) Regards, Steve Message: 42 Date: Thu, 07 Nov 2013 23:16:18 -0500 From: Chris Wilkinson kins...@verizon.net To: r-help@r-project.org, Chris Wilkinson kins...@verizon.net Subject: [R] Earth (MARS) package with categorical predictors Message-ID: ml99syxejec3ep0u4h0je78h.1383884178...@email.android.com Content-Type: text/plain; charset=utf-8 It appears to be legitimate to include multi-level categorical and continuous variables in defining the model for earth (e.g. y ~ cat + cont1 + cont2) but is it also then possible use categoricals in the predict method using the earth result? I tried but it returns an error which is not very informative. Thanks Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SOLVED: Count number of consecutive zeros by group
Thanks to all of you. All solutions work fine. I'm running S Ellisons version with Williams comment. Perfect for what I'm doing. And sorry for using a name same as a base R function (twice) ;-) Cheers, Carlos 2013/11/1 PIKAL Petr petr.pi...@precheza.cz Hi Yes you are right. This gives number of zeroes not max number of consecutive zeroes. Regards Petr -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Friday, November 01, 2013 2:17 PM To: R help Cc: PIKAL Petr; Carlos Nasher Subject: Re: [R] Count number of consecutive zeros by group I think this gives a different result than the one OP asked for: df1 - structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), x = c(1, 0, 0, 1, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0)), .Names = c(ID, x), row.names = c(NA, -22L), class = data.frame) with(df1, sapply(split(x, ID), function(x) sum(x==0))) with(df1,tapply(x,list(ID),function(y) {rl - rle(!y); max(c(0,rl$lengths[rl$values]))})) A.K. On Friday, November 1, 2013 6:01 AM, PIKAL Petr petr.pi...@precheza.cz wrote: Hi Another option is sapply/split/sum construction with(data, sapply(split(x, ID), function(x) sum(x==0))) Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Carlos Nasher Sent: Thursday, October 31, 2013 6:46 PM To: S Ellison Cc: r-help@r-project.org Subject: Re: [R] Count number of consecutive zeros by group If I apply your function to my test data: ID - c(1,1,1,2,2,3,3,3,3) x - c(1,0,0,0,0,1,1,0,1) data - data.frame(ID=ID,x=x) rm(ID,x) f2 - function(x) { max( rle(x == 0)$lengths ) } with(data, tapply(x, ID, f2)) the result is 1 2 3 2 2 2 which is not what I'm aiming for. It should be 1 2 3 2 2 1 I think f2 does not return the max of consecutive zeros, but the max of any consecutve number... Any idea how to fix this? 2013/10/31 S Ellison s.elli...@lgcgroup.com -Original Message- So I want to get the max number of consecutive zeros of variable x for each ID. I found rle() to be helpful for this task; so I did: FUN - function(x) { rles - rle(x == 0) } consec - lapply(split(df[,2],df[,1]), FUN) You're probably better off with tapply and a function that returns what you want. You're probably also better off with a data frame name that isn't a function name, so I'll use dfr instead of df... dfr- data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups numbered 1-5, equal size but that doesn't matter for tapply f2 - function(x) { max( rle(x == 0)$lengths ) } with(dfr, tapply(x, ID, f2)) S Ellison *** This email and any attachments are confidential. Any u...{{dropped:24}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- - Carlos Nasher Buchenstr. 12 22299 Hamburg tel:+49 (0)40 67952962 mobil:+49 (0)175 9386725 mail: carlos.nas...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
Hi Viarti, can you clarify your question slightly? (1) When you say the predict value still on pattern scale what do you mean? It sounds like you are saying that the prediction values are on the Ytraining values specifically or do you mean that you expect the scale to differ. (2) When you say how to change the predict value to the real data value do you mean change the scale. Perhaps if you gave some examples of the desired outputs it would be eaiser. Best, Collin. Dear Mr/Mrs. I am Viarti Eminita, student from magister fifth level of Statistics in Bogor Agriculture University. Mr/ Mrs, now I'm analyzing ANN on time series data, I am learning kohonen package for series data, but when I want to predict, the predict value still on pattern scale. I wanna ask how to change the predict value to real data value? example: data - read.table(D:/THESIS/Data/data.txt,head=T) Ytraining - scale(data[1:168,3]) Xtraining - scale(data[1:168,4:6]) Xtest - scale(data[168:180,4:6]) xyf - xyf(Xtraining,Ytraining,grid = somgrid(5, 5, hexagonal)) xyf.prediction - predict(xyf,newdata=Xtest) thank's Mr/Mrs. best regard, viarti [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ensemble methods
Dear Mr/Mrs I am Iut, student of graduate student in Bogor Agriculture Institur I read a book on ensemble methods in data mining by Seni and Elder and find R code about bagging. I am confused how to call these functions and and how to agregate it with the majority votes? I think there is missing code in here.What if the function is replaced with SVM? Example : genPredictors - function(seed = 123, N = 30) { # Load package with random number generation # for the multivariate normal distribution library(mnormt) # 5 features each having a standard Normal # distribution with pairwise correlation 0.95 Rho - matrix(c(1,.95,.95,.95,.95, + .95, 1,.95,.95,.95, + .95,.95,1,.95,.95, + .95,.95,.95,1,.95, + .95,.95,.95,.95,1), 5, 5) mu - c(rep(0,5)) set.seed(seed); x - rmnorm(N, mu, Rho) colnames(x) - c(x1, x2, x3, x4, x5) return(x) } genTarget - function(x, N, seed = 123) { # Response Y is generated according to: # Pr(Y = 1 | x1 = 0.5) = 0.2, # Pr(Y = 1 | x1 0.5) = 0.8 y - c(rep(-1, N)) set.seed(seed); for (i in 1:N) { if ( x[i,1] = 0.5 ) { if ( runif(1) = 0.2 ) { y[i] - 1 } else { y[i] - 0 } } else { if ( runif(1) = 0.8 ) { y[i] - 1 } else { y[i] - 0 } } } return(y) } genBStrapSamp - function(seed = 123, N = 200, Size = 30) { set.seed(seed) sampleList - vector(mode = list, length = N) for (i in 1:N) { sampleList[[i]] - sample(1:Size, replace=TRUE) } return(sampleList) } fitBStrapTrees - function(data, sampleList, N) { treeList - vector(mode = list, length = N) for (i in 1:N) { tree.params=list(minsplit = 4, minbucket = 2, maxdepth = 7) treeList[[i]] - fitClassTree(data[sampleList[[i]],], tree.params) } return(treeList) } fitClassTree - function(x, params, w = NULL, seed = 123) { library(rpart) set.seed(seed) tree - rpart(y ~ ., method = class, data = x, weights = w, cp = 0, minsplit = params.minsplit, minbucket = params.minbucket, maxdepth = params.maxdepth) return(tree) } thankyou very much best regard, Iut [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ensemble methods
See the R randomForest package. This already does ensemble classification and regression. -- Bert On Mon, Nov 11, 2013 at 10:04 AM, Iut Tri Utami triutami@gmail.com wrote: Dear Mr/Mrs I am Iut, student of graduate student in Bogor Agriculture Institur I read a book on ensemble methods in data mining by Seni and Elder and find R code about bagging. I am confused how to call these functions and and how to agregate it with the majority votes? I think there is missing code in here.What if the function is replaced with SVM? Example : genPredictors - function(seed = 123, N = 30) { # Load package with random number generation # for the multivariate normal distribution library(mnormt) # 5 features each having a standard Normal # distribution with pairwise correlation 0.95 Rho - matrix(c(1,.95,.95,.95,.95, + .95, 1,.95,.95,.95, + .95,.95,1,.95,.95, + .95,.95,.95,1,.95, + .95,.95,.95,.95,1), 5, 5) mu - c(rep(0,5)) set.seed(seed); x - rmnorm(N, mu, Rho) colnames(x) - c(x1, x2, x3, x4, x5) return(x) } genTarget - function(x, N, seed = 123) { # Response Y is generated according to: # Pr(Y = 1 | x1 = 0.5) = 0.2, # Pr(Y = 1 | x1 0.5) = 0.8 y - c(rep(-1, N)) set.seed(seed); for (i in 1:N) { if ( x[i,1] = 0.5 ) { if ( runif(1) = 0.2 ) { y[i] - 1 } else { y[i] - 0 } } else { if ( runif(1) = 0.8 ) { y[i] - 1 } else { y[i] - 0 } } } return(y) } genBStrapSamp - function(seed = 123, N = 200, Size = 30) { set.seed(seed) sampleList - vector(mode = list, length = N) for (i in 1:N) { sampleList[[i]] - sample(1:Size, replace=TRUE) } return(sampleList) } fitBStrapTrees - function(data, sampleList, N) { treeList - vector(mode = list, length = N) for (i in 1:N) { tree.params=list(minsplit = 4, minbucket = 2, maxdepth = 7) treeList[[i]] - fitClassTree(data[sampleList[[i]],], tree.params) } return(treeList) } fitClassTree - function(x, params, w = NULL, seed = 123) { library(rpart) set.seed(seed) tree - rpart(y ~ ., method = class, data = x, weights = w, cp = 0, minsplit = params.minsplit, minbucket = params.minbucket, maxdepth = params.maxdepth) return(tree) } thankyou very much best regard, Iut [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date handling in R is hard to understand
Thank you all for taking your time and looking at this problem. Yes, date handling is a problem with many languages. I have resolved the rbind not being able to handle different data formats in a column for this specific problem by making the data format a character and later convert back to numeric. Thank you again On Mon, Nov 11, 2013 at 3:06 AM, PIKAL Petr petr.pi...@precheza.cz wrote: Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Alemu Tadesse Sent: Friday, November 08, 2013 8:41 PM To: r-help@r-project.org Subject: [R] Date handling in R is hard to understand Dear All, I usually work with time series data. The data may come in AM/PM date format or on 24 hour time basis. R can not recognize the two differences automatically - at least for me. I have to specifically tell R in which time format the data is. It seems that Pandas knows how to handle date without being told the format. The problem arises when I try to shift time by a certain time. Say adding 3600 to shift it forward, that case I have to use something like: Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %H:%M)+3600 depending on the format. The date also attaches MDT or MST and so on. When merging two data frames with dates of different format that may create a problem (I think). When I get data from excel it could be in any/random format and I needed to customize the date to use in R in one of the above formats. Any TIPS - for automatic processing with no need to specifically tell the data format ? Another problem I saw was that when using r bind to bind data frames, if one column of one of the data frames is a character data (say for example none - coming from mysql) format R doesn't know how to concatenate numeric column from the other data frame to it. I needed to rbind/cbind can use data.frame method which add any column specific format. However with normal method, it results in matrix which has to have common type of data in all columns (actually matrix is only vector with dimensions). str(cbind(airquality, 1:153)) 'data.frame': 153 obs. of 7 variables: $ ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ solar.r: int 190 118 149 313 NA NA 299 99 19 194 ... $ wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ temp : int 67 72 74 62 56 66 65 59 61 69 ... $ month : int 5 5 5 5 5 5 5 5 5 5 ... $ day: int 1 2 3 4 5 6 7 8 9 10 ... $ 1:153 : int 1 2 3 4 5 6 7 8 9 10 ... Regards Petr change the numeric to character and later after binding takes place I had to re-convert it to numeric. But, this causes problem in an automated environment. Any suggestion ? Thanks Mihretu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to introduce missing data for complete data
Here's a suggestion. The sample() function takes random samples of sets. See ?sample The set you want to take a random sample from is the rows of your data. Represent the rows by their row numbers. To get a vector of row numbers, you can use the seq() function. See ?seq Let's suppose your data is in a data frame named 'mydat', and you want to introduce 10 instances of missing data. nr - nrow(mydat) set.to.missing - sample( seq(nr) , 10) mydat$Amount[set.to.missing] - NA A simplified example of the core idea is: foo -seq(10) foo [1] 1 2 3 4 5 6 7 8 9 10 foo[3] - NA foo [1] 1 2 NA 4 5 6 7 8 9 10 -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 11/10/13 10:31 PM, dila radi dilarad...@gmail.com wrote: Hi, Im new R users. In my research I use rainfall data and Im interested in estimating missing data. I would like to use Normal Ratio Method to estimate missing data. My problem is, how do I introduce missing data randomly within my complete set of data? Stn ID Year Mth Day Amount 48603 71 1 1 1 48603 71 1 2 0.5 48603 71 1 3 1.3 48603 71 1 4 0.8 48603 71 1 5 0 48603 71 1 6 0 48603 71 1 7 0 ... Thank you so much for your attention and help. Regards, Dila [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
Hi R Experts, How do I mark rows in dataframe based on a condition that's based off another row in the same dataframe? I want to mark any combination of FY,ID, TT=='HC' rows that have a FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and 11. My data looks something like this: FY ID TT 1 FY09 1 HC 2 FY10 1 HC 3 FY11 1 HC 4 FY12 1 HC 5 FY12 1 TER 6 FY09 2 HC 7 FY10 2 HC 8 FY10 2 TER 9 FY11 2 HC 10 FY12 2 HC 11 FY13 2 HC 12 FY13 2 TER I know for this specific example I can use: HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T) However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor variable. I want to know if there is another way of doing the same thing without sorting the data. I tried the last line of code below but it gave me unexpected results. It marks the first three rows with 0 and everything else with 1. Based on the warning messages looks like it has something to do with longer object length is not a multiple of shorter object length. But I am now stumped. #REPRODUCIBLE EXAMPLE FY-factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11,FY12,FY13,FY13)) ID-c(rep(1,5),rep(2,7)) TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER)) HTDF-data.frame(FY,ID,TT) #Summarize data and get max TT. TT is a binary factor variable library(sqldf) HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY') # Initiate new variable and assign 0 or 1 HTDF$EXCL-0 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS HTDF$EXCL-ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAXTT,0,1) Dan Lopez Workforce Analyst LLNL HRIM - Workforce Analytics Metrics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] select .txt from .txt in a directory
Thanks, AK. The three codes worked as expected. Again, thanks so much for understanding my problem and proving the right solutions. Atem. On Saturday, November 9, 2013 6:27 PM, arun smartpink...@yahoo.com wrote: HI, The code could be shortened by using ?merge or ?join(). library(plyr) ##Using the output from `lst6` lst7 - lapply(lst6,function(x) {x1 - data.frame(Year=rep(1961:2005,each=12),Mo=rep(1:12,45)); x2 -join(x1,x,type=left,by=c(Year,Mo))}) ##rest are the same (only change in object names) sapply(lst7,nrow) lst8 -lapply(lst7,function(x) data.frame(col1=unlist(data.frame(t(x)[-c(1:2),]),use.names=FALSE))) lst9- lapply(seq_along(lst8),function(i){ x- lst11[[i]] colnames(x)- lstf1[i] row.names(x)- 1:nrow(x) x }) sapply(lst9,nrow) res2New - do.call(cbind,lst9) dim(res2New) #[1] 16740 98 res2New[res2New ==-.9]-NA # change missing value identifier as in your data set which(res2New==-.9) #integer(0) dates1-seq.Date(as.Date('1Jan1961',format=%d%b%Y),as.Date('31Dec2005',format=%d%b%Y),by=day) dates2- as.character(dates1) sldat- split(dates2,list(gsub(-.*,,dates2))) lst12-lapply(sldat,function(x) lapply(split(x,gsub(.*-(.*)-.*,\\1,x)), function(y){x1-as.numeric(gsub(.*-.*-(.*),\\1,y));if((31-max(x1))0) {x2-seq(max(x1)+1,31,1);x3-paste0(unique(gsub((.*-.*-).*,\\1,y)),x2);c(y,x3)} else y} )) any(sapply(lst12,function(x) any(lapply(x,length)!=31))) #[1] FALSE lst22-lapply(lst12,function(x) unlist(x,use.names=FALSE)) sapply(lst22,length) dates3-unlist(lst22,use.names=FALSE) length(dates3) res3New - data.frame(dates=dates3,res2New,stringsAsFactors=FALSE) str(res3New) res3New$dates-as.Date(res3New$dates) res4New - res3New[!is.na(res3New$dates),] res4New[1:3,1:3] dim(res4New) colnames(res4) - colnames(res4New) identical(res4,res4New) #[1] TRUE A.K. On Saturday, November 9, 2013 5:46 PM, arun smartpink...@yahoo.com wrote: Hi, Try: library(stringr) # Created the selected files (98) in a separate working folder (SubsetFiles1) (refer to my previous mail) filelst - list.files() #Sublst - filelst[1:2] res - lapply(filelst,function(x) {con - file(x) Lines1 - readLines(con) close(con) Lines2 - Lines1[-1] Lines3 - str_split(Lines2,-.9M) Lines4 - str_trim(unlist(lapply(Lines3,function(x) {x[x==] - NA paste(x,collapse= )}))) Lines5 - gsub((\\d+)[A-Za-z],\\1,Lines4) res1 - read.table(text=Lines5,sep=,header=FALSE,fill=TRUE) res1}) ##Created another folder Modified to store the res files lapply(seq_along(res),function(i) write.table(res[[i]],paste(/home/arunksa111/Zl/Modified,paste0(Mod_,filelst[i]),sep=/),row.names=FALSE,quote=FALSE)) lstf1 - list.files(path=/home/arunksa111/Zl/Modified) lst1 - lapply(lstf1,function(x) readLines(paste(/home/arunksa111/Zl/Modified,x,sep=/))) which(lapply(lst1,function(x) length(grep(\\d+-.9,x)))0 ) #[1] 7 11 14 15 30 32 39 40 42 45 46 53 60 65 66 68 69 70 73 74 75 78 80 82 83 #[26] 86 87 90 91 93 lst2 - lapply(lst1,function(x) gsub((\\d+)(-.9),\\1 \\2,x)) #lapply(lst2,function(x) x[grep(\\d+-.9,x)]) ##checking for the pattern lst3 - lapply(lst2,function(x) {x-gsub((-.9)(-.9),\\1 \\2,x)})# #lapply(lst3,function(x) x[grep(\\d+-.9,x)]) ##checking for the pattern # lapply(lst3,function(x) x[grep(-.9,x)]) ###second check lst4 - lapply(lst3,function(x) gsub((Day) (\\d+),\\1_\\2, x[-1])) #removed the additional header V1, V2, etc. #sapply(lst4,function(x) length(strsplit(x[1], )[[1]])) #checking the number of columns that should be present lst5 - lapply(lst4,function(x) unlist(lapply(x, function(y) word(y,1,33 lst6 - lapply(lst5,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep=,fill=TRUE)) # head(lst6[[94]],3) lst7 - lapply(lst6,function(x) x[x$Year =1961 x$Year =2005,]) #head(lst7[[45]],3) lst8 - lapply(lst7,function(x) x[!is.na(x$Year),]) lst9 - lapply(lst8,function(x) { if((min(x$Year)1961)|(max(x$Year)2005)){ n1- (min(x$Year)-1961)*12 x1- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) n2- (2005-max(x$Year))*12 x2- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) colnames(x1) - colnames(x) colnames(x2) - colnames(x) x3- rbind(x1,x,x2) } else if((min(x$Year)==1961) (max(x$Year)==2005)) { if((min(x$Mo[x$Year==1961])1)|(max(x$Mo[x$Year==2005])12)){ n1 - min(x$Mo[x$Year==1961])-1 x1 - as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) n2 - (12-max(x$Mo[x$Year==2005])) x2 - as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) colnames(x1) - colnames(x) colnames(x2) - colnames(x) x3 - rbind(x1,x,x2) } else { x } } }) which(sapply(lst9,nrow)!=540) #[1] 45 46 54 64 65 66 70 75 97 lst10 - lapply(lst9,function(x) {x1 - x[!is.na(x$Year),] hx1 - head(x1,1) tx1 - tail(x1,1) x2 - as.data.frame(matrix(NA, ncol=ncol(x),
Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
If you have an algorithm that only works on sorted data, it is easy to write a function that sorts [a copy of] the data, applies the algorithm, then puts the result back in the order of the original data. E.g., f - function (data) { ord - with(data, order(TT, ID, FY)) # data[ord,] will be sorted in your required order data$EXCL3 - 1 * duplicated(data[ord, 1:2], fromLast = TRUE)[order(ord)] # [order(ord)] puts it back in original order data } E.g., i - c(12, 5, 10, 6, 4, 2, 1, 3, 7, 11, 9, 8) scrambled - HTDF[i,] f(scrambled) FY ID TT EXCL3 12 FY13 2 TER 0 5 FY12 1 TER 0 10 FY12 2 HC 0 6 FY09 2 HC 0 4 FY12 1 HC 1 2 FY10 1 HC 0 1 FY09 1 HC 0 3 FY11 1 HC 0 7 FY10 2 HC 1 11 FY13 2 HC 1 9 FY11 2 HC 0 8 FY10 2 TER 0 Or is your dataset so large that this sorting and unsorting would take too long or too much space? (There are faster ways of doing this than duplicated(), but the details depend on some details like whether or not there may be more than 2 FY/ID duplicates.] Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Lopez, Dan Sent: Monday, November 11, 2013 12:50 PM To: R help (r-help@r-project.org) Subject: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe? Hi R Experts, How do I mark rows in dataframe based on a condition that's based off another row in the same dataframe? I want to mark any combination of FY,ID, TT=='HC' rows that have a FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and 11. My data looks something like this: FY ID TT 1 FY09 1 HC 2 FY10 1 HC 3 FY11 1 HC 4 FY12 1 HC 5 FY12 1 TER 6 FY09 2 HC 7 FY10 2 HC 8 FY10 2 TER 9 FY11 2 HC 10 FY12 2 HC 11 FY13 2 HC 12 FY13 2 TER I know for this specific example I can use: HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T) However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor variable. I want to know if there is another way of doing the same thing without sorting the data. I tried the last line of code below but it gave me unexpected results. It marks the first three rows with 0 and everything else with 1. Based on the warning messages looks like it has something to do with longer object length is not a multiple of shorter object length. But I am now stumped. #REPRODUCIBLE EXAMPLE FY- factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11,FY12,FY13 ,FY13)) ID-c(rep(1,5),rep(2,7)) TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER)) HTDF-data.frame(FY,ID,TT) #Summarize data and get max TT. TT is a binary factor variable library(sqldf) HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY') # Initiate new variable and assign 0 or 1 HTDF$EXCL-0 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS HTDF$EXCL- ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAX TT,0,1) Dan Lopez Workforce Analyst LLNL HRIM - Workforce Analytics Metrics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Security when using R
Hello. At the company I work for, I recently requested having R loaded onto my desktop and some of my colleagues. My company's IT/Security groups are having trouble assessing whether R software meets their standards. Can anyone point me to a source where i can read about how R uses data? does it store the data somewhere? Does data ever actually leave the company's environment? etc...? Thanks. Sean __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
Hi, You may try: fun1 - function(dat){ dat$EXCL3 - 0 dat$EXCL3[dat$TT==HC] - 1*as.character(interaction(dat[,1:2]))[dat$TT==HC] %in% as.character(interaction(dat[,1:2]))[dat$TT==TER] dat } fun1(HTDF) set.seed(14) indx - sample(1:nrow(HTDF),12) HTDF1 - HTDF[indx,] fun1(HTDF1) A.K. On Monday, November 11, 2013 4:49 PM, Lopez, Dan lopez...@llnl.gov wrote: Hi R Experts, How do I mark rows in dataframe based on a condition that's based off another row in the same dataframe? I want to mark any combination of FY,ID, TT=='HC' rows that have a FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and 11. My data looks something like this: FY ID TT 1 FY09 1 HC 2 FY10 1 HC 3 FY11 1 HC 4 FY12 1 HC 5 FY12 1 TER 6 FY09 2 HC 7 FY10 2 HC 8 FY10 2 TER 9 FY11 2 HC 10 FY12 2 HC 11 FY13 2 HC 12 FY13 2 TER I know for this specific example I can use: HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T) However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor variable. I want to know if there is another way of doing the same thing without sorting the data. I tried the last line of code below but it gave me unexpected results. It marks the first three rows with 0 and everything else with 1. Based on the warning messages looks like it has something to do with longer object length is not a multiple of shorter object length. But I am now stumped. #REPRODUCIBLE EXAMPLE FY-factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11,FY12,FY13,FY13)) ID-c(rep(1,5),rep(2,7)) TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER)) HTDF-data.frame(FY,ID,TT) #Summarize data and get max TT. TT is a binary factor variable library(sqldf) HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY') # Initiate new variable and assign 0 or 1 HTDF$EXCL-0 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS HTDF$EXCL-ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAXTT,0,1) Dan Lopez Workforce Analyst LLNL HRIM - Workforce Analytics Metrics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] colours legend, for loop,density plot
Hi , thanks in advance I have the follow code: normal-sort(rnorm(1000))cauchy-sort(rcauchy(1000)) t3-sort(rt(1000,3))t10-sort(rt(1000, 10)) col-c(green,blue,orange,purple) v-list(normal,cauchy,t3,t10) names(v)-c(Normal, Cauchy, T-stud 3 df, T-stud 10 df) par(mfrow=c(1,2)) plot(density(normal),col=col[[1]],main=Funciones de densidad) for ( i in 2:4) { lines(density(v[[i]]),col=col[[i]],lty=i+2) } legend(x=-4,y=0.3,names(v),col=col,cex=0.6) The problem is that in the legend doesn't appear colours so I can not identify which curve is each one, please could you tell me what do I neet to change in order to solve it?? Thanks a lot, Tere [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
On Mon, Nov 11, 2013 at 3:50 PM, Lopez, Dan lopez...@llnl.gov wrote: Hi R Experts, How do I mark rows in dataframe based on a condition that's based off another row in the same dataframe? I want to mark any combination of FY,ID, TT=='HC' rows that have a FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and 11. My data looks something like this: FY ID TT 1 FY09 1 HC 2 FY10 1 HC 3 FY11 1 HC 4 FY12 1 HC 5 FY12 1 TER 6 FY09 2 HC 7 FY10 2 HC 8 FY10 2 TER 9 FY11 2 HC 10 FY12 2 HC 11 FY13 2 HC 12 FY13 2 TER I know for this specific example I can use: HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T) However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor variable. I want to know if there is another way of doing the same thing without sorting the data. I tried the last line of code below but it gave me unexpected results. It marks the first three rows with 0 and everything else with 1. Based on the warning messages looks like it has something to do with longer object length is not a multiple of shorter object length. But I am now stumped. #REPRODUCIBLE EXAMPLE FY-factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11,FY12,FY13,FY13)) ID-c(rep(1,5),rep(2,7)) TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER)) HTDF-data.frame(FY,ID,TT) #Summarize data and get max TT. TT is a binary factor variable library(sqldf) HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY') # Initiate new variable and assign 0 or 1 HTDF$EXCL-0 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS HTDF$EXCL-ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAXTT,0,1) For each FY, ID group ave applies f to TT == 'TER' returning a logical vector that is TRUE for each HC if TER is in the group ad otherwise FALSE. Finally we add 0 to convert from TRUE/FALSE to 1/0. The rows of HTDF need not be in any specific order and their oreder will be preserved. f - function(x) any(x) !x transform(HTDF, EXCL = ave(TT == 'TER', FY, ID, FUN = f) + 0) FY ID TT EXCL 1 FY09 1 HC0 2 FY10 1 HC0 3 FY11 1 HC0 4 FY12 1 HC1 5 FY12 1 TER0 6 FY09 2 HC0 7 FY10 2 HC1 8 FY10 2 TER0 9 FY11 2 HC0 10 FY12 2 HC0 11 FY13 2 HC1 12 FY13 2 TER0 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] colours legend, for loop,density plot
On 11/12/2013 09:52 AM, Mª Teresa Martinez Soriano wrote: normal-sort(rnorm(1000))cauchy-sort(rcauchy(1000)) t3-sort(rt(1000,3)) t10-sort(rt(1000, 10)) col-c(green,blue,orange,purple) v-list(normal,cauchy,t3,t10) names(v)-c(Normal, Cauchy, T-stud 3 df, T-stud 10 df) par(mfrow=c(1,2)) plot(density(normal),col=col[[1]],main=Funciones de densidad) for ( i in 2:4) { lines(density(v[[i]]),col=col[[i]],lty=i+2) } legend(x=-4,y=0.3,names(v),col=col,cex=0.6) Hi Tere, Try this: legend(x=-2,y=0.04,names(v),col=col,cex=0.6,lty=c(1,4:6)) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Security when using R
As a starting point for answering this question, you might sear Google for The RAppArmor Package: Enforcing Security Policies in R Using Dynamic Sandboxing on Linux kw On Mon, Nov 11, 2013 at 4:01 PM, seanstcl...@verizon.net wrote: Hello. At the company I work for, I recently requested having R loaded onto my desktop and some of my colleagues. My company's IT/Security groups are having trouble assessing whether R software meets their standards. Can anyone point me to a source where i can read about how R uses data? does it store the data somewhere? Does data ever actually leave the company's environment? etc...? Thanks. Sean __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Kevin Wright [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Security when using R
See below -- Don MacQueen Lawrence Livermore National Laboratory On 11/11/13 2:01 PM, seanstcl...@verizon.net seanstcl...@verizon.net wrote: Hello. At the company I work for, I recently requested having R loaded onto my desktop and some of my colleagues. My company's IT/Security groups are having trouble assessing whether R software meets their standards. Can anyone point me to a source where i can read about how R uses data? I would start by downloading An Introduction to R from CRAN and searching on save and .RData. does it store the data somewhere? Yes. In memory to start, and optionally to disk, normally somewhere in the user's home directory or working directory. Does data ever actually leave the company's environment? Not unless the user does something explicit to make it happen. etc...? No less secure than, say, MS Excel, I would think. Others with a deeper understanding than I may point out exceptions or special cases worth knowing about ... I hope. Thanks. Sean __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
Thanks. Dan -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Monday, November 11, 2013 2:26 PM To: R help (r-help@r-project.org) Cc: Lopez, Dan Subject: Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe? Hi, You may try: fun1 - function(dat){ dat$EXCL3 - 0 dat$EXCL3[dat$TT==HC] - 1*as.character(interaction(dat[,1:2]))[dat$TT==HC] %in% as.character(interaction(dat[,1:2]))[dat$TT==TER] dat } fun1(HTDF) set.seed(14) indx - sample(1:nrow(HTDF),12) HTDF1 - HTDF[indx,] fun1(HTDF1) A.K. On Monday, November 11, 2013 4:49 PM, Lopez, Dan lopez...@llnl.gov wrote: Hi R Experts, How do I mark rows in dataframe based on a condition that's based off another row in the same dataframe? I want to mark any combination of FY,ID, TT=='HC' rows that have a FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and 11. My data looks something like this: FY ID TT 1 FY09 1 HC 2 FY10 1 HC 3 FY11 1 HC 4 FY12 1 HC 5 FY12 1 TER 6 FY09 2 HC 7 FY10 2 HC 8 FY10 2 TER 9 FY11 2 HC 10 FY12 2 HC 11 FY13 2 HC 12 FY13 2 TER I know for this specific example I can use: HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T) However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor variable. I want to know if there is another way of doing the same thing without sorting the data. I tried the last line of code below but it gave me unexpected results. It marks the first three rows with 0 and everything else with 1. Based on the warning messages looks like it has something to do with longer object length is not a multiple of shorter object length. But I am now stumped. #REPRODUCIBLE EXAMPLE FY-factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11,FY12,FY13,FY13)) ID-c(rep(1,5),rep(2,7)) TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER)) HTDF-data.frame(FY,ID,TT) #Summarize data and get max TT. TT is a binary factor variable library(sqldf) HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY') # Initiate new variable and assign 0 or 1 HTDF$EXCL-0 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS HTDF$EXCL-ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAXTT,0,1) Dan Lopez Workforce Analyst LLNL HRIM - Workforce Analytics Metrics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
Great advice! Thank you. Dan -Original Message- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, November 11, 2013 1:18 PM To: Lopez, Dan; R help (r-help@r-project.org) Subject: RE: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe? If you have an algorithm that only works on sorted data, it is easy to write a function that sorts [a copy of] the data, applies the algorithm, then puts the result back in the order of the original data. E.g., f - function (data) { ord - with(data, order(TT, ID, FY)) # data[ord,] will be sorted in your required order data$EXCL3 - 1 * duplicated(data[ord, 1:2], fromLast = TRUE)[order(ord)] # [order(ord)] puts it back in original order data } E.g., i - c(12, 5, 10, 6, 4, 2, 1, 3, 7, 11, 9, 8) scrambled - HTDF[i,] f(scrambled) FY ID TT EXCL3 12 FY13 2 TER 0 5 FY12 1 TER 0 10 FY12 2 HC 0 6 FY09 2 HC 0 4 FY12 1 HC 1 2 FY10 1 HC 0 1 FY09 1 HC 0 3 FY11 1 HC 0 7 FY10 2 HC 1 11 FY13 2 HC 1 9 FY11 2 HC 0 8 FY10 2 TER 0 Or is your dataset so large that this sorting and unsorting would take too long or too much space? (There are faster ways of doing this than duplicated(), but the details depend on some details like whether or not there may be more than 2 FY/ID duplicates.] Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Lopez, Dan Sent: Monday, November 11, 2013 12:50 PM To: R help (r-help@r-project.org) Subject: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe? Hi R Experts, How do I mark rows in dataframe based on a condition that's based off another row in the same dataframe? I want to mark any combination of FY,ID, TT=='HC' rows that have a FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and 11. My data looks something like this: FY ID TT 1 FY09 1 HC 2 FY10 1 HC 3 FY11 1 HC 4 FY12 1 HC 5 FY12 1 TER 6 FY09 2 HC 7 FY10 2 HC 8 FY10 2 TER 9 FY11 2 HC 10 FY12 2 HC 11 FY13 2 HC 12 FY13 2 TER I know for this specific example I can use: HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T) However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor variable. I want to know if there is another way of doing the same thing without sorting the data. I tried the last line of code below but it gave me unexpected results. It marks the first three rows with 0 and everything else with 1. Based on the warning messages looks like it has something to do with longer object length is not a multiple of shorter object length. But I am now stumped. #REPRODUCIBLE EXAMPLE FY- factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10,FY11 ,FY12,FY13 ,FY13)) ID-c(rep(1,5),rep(2,7)) TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER)) HTDF-data.frame(FY,ID,TT) #Summarize data and get max TT. TT is a binary factor variable library(sqldf) HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY') # Initiate new variable and assign 0 or 1 HTDF$EXCL-0 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS HTDF$EXCL- ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==HTDF.MAX$MAX TT,0,1) Dan Lopez Workforce Analyst LLNL HRIM - Workforce Analytics Metrics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
Hi Gabor, This is a great solution! I will use it. Thank you! Dan -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Monday, November 11, 2013 3:02 PM To: Lopez, Dan Cc: R help (r-help@r-project.org) Subject: Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe? On Mon, Nov 11, 2013 at 3:50 PM, Lopez, Dan lopez...@llnl.gov wrote: Hi R Experts, How do I mark rows in dataframe based on a condition that's based off another row in the same dataframe? I want to mark any combination of FY,ID, TT=='HC' rows that have a FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and 11. My data looks something like this: FY ID TT 1 FY09 1 HC 2 FY10 1 HC 3 FY11 1 HC 4 FY12 1 HC 5 FY12 1 TER 6 FY09 2 HC 7 FY10 2 HC 8 FY10 2 TER 9 FY11 2 HC 10 FY12 2 HC 11 FY13 2 HC 12 FY13 2 TER I know for this specific example I can use: HTDF$EXCL3-1*duplicated(HTDF[,1:2],fromLast=T) However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor variable. I want to know if there is another way of doing the same thing without sorting the data. I tried the last line of code below but it gave me unexpected results. It marks the first three rows with 0 and everything else with 1. Based on the warning messages looks like it has something to do with longer object length is not a multiple of shorter object length. But I am now stumped. #REPRODUCIBLE EXAMPLE FY-factor(c(FY09,FY10,FY11,FY12,FY12,FY09,FY10,FY10, FY11,FY12,FY13,FY13)) ID-c(rep(1,5),rep(2,7)) TT-factor(c(rep(HC,4),TER,HC,HC,TER,HC,HC,HC,TER)) HTDF-data.frame(FY,ID,TT) #Summarize data and get max TT. TT is a binary factor variable library(sqldf) HTDF.MAX-sqldf('SELECT ID,FY,Max(TT) MAXTT FROM HTDF GROUP BY ID,FY') # Initiate new variable and assign 0 or 1 HTDF$EXCL-0 # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS HTDF$EXCL-ifelse(HTDF$FY==HTDF.MAX$FYHTDF$ID==HTDF.MAX$IDHTDF$TT==H TDF.MAX$MAXTT,0,1) For each FY, ID group ave applies f to TT == 'TER' returning a logical vector that is TRUE for each HC if TER is in the group ad otherwise FALSE. Finally we add 0 to convert from TRUE/FALSE to 1/0. The rows of HTDF need not be in any specific order and their oreder will be preserved. f - function(x) any(x) !x transform(HTDF, EXCL = ave(TT == 'TER', FY, ID, FUN = f) + 0) FY ID TT EXCL 1 FY09 1 HC0 2 FY10 1 HC0 3 FY11 1 HC0 4 FY12 1 HC1 5 FY12 1 TER0 6 FY09 2 HC0 7 FY10 2 HC1 8 FY10 2 TER0 9 FY11 2 HC0 10 FY12 2 HC0 11 FY13 2 HC1 12 FY13 2 TER0 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Update a variable in a dataframe based on variables in another dataframe of a different size
Below is how I am currently doing this. Is there a more efficient way to do this? The scenario is that I have two dataframes of different sizes. I need to update one binary factor variable in one of those dataframes by matching on two variables. If there is no match keep as is otherwise update. Also the variable being update, TT in this case should remain a binary factor variable (levels='HC','TER') HTDF2-merge(H_DF,T_DF,by=c(FY,ID),all.x=T) HTDF2$TT-factor(ifelse(is.na(HTDF2$TT.y),HTDF2$TT.x,HTDF2$TT.y),labels=c(HC,TER)) HTDF2-HTDF2[,-(3:4)] # REPRODUCIBLE EXAMPLE DATA FOR ABOVE.. dput(H_DF) structure(list(FY = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L), .Label = c(FY09, FY10, FY11, FY12, FY13), class = factor), ID = c(1, 1, 1, 1, 2, 2, 2, 2, 2), TT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(HC, TER), class = factor)), .Names = c(FY, ID, TT), class = data.frame, row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 9L, 10L, 11L)) dput(T_DF) structure(list(FY = structure(c(4L, 2L, 5L), .Label = c(FY09, FY10, FY11, FY12, FY13), class = factor), ID = c(1, 2, 2), TT = structure(c(2L, 2L, 2L), .Label = c(HC, TER), class = factor)), .Names = c(FY, ID, TT), row.names = c(5L, 8L, 12L), class = data.frame) Dan Lopez LLNL, HRIM - Workforce Analytics Metrics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Update a variable in a dataframe based on variables in another dataframe of a different size
On Mon, Nov 11, 2013 at 8:04 PM, Lopez, Dan lopez...@llnl.gov wrote: Below is how I am currently doing this. Is there a more efficient way to do this? The scenario is that I have two dataframes of different sizes. I need to update one binary factor variable in one of those dataframes by matching on two variables. If there is no match keep as is otherwise update. Also the variable being update, TT in this case should remain a binary factor variable (levels='HC','TER') HTDF2-merge(H_DF,T_DF,by=c(FY,ID),all.x=T) HTDF2$TT-factor(ifelse(is.na(HTDF2$TT.y),HTDF2$TT.x,HTDF2$TT.y),labels=c(HC,TER)) HTDF2-HTDF2[,-(3:4)] # REPRODUCIBLE EXAMPLE DATA FOR ABOVE.. dput(H_DF) structure(list(FY = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L), .Label = c(FY09, FY10, FY11, FY12, FY13), class = factor), ID = c(1, 1, 1, 1, 2, 2, 2, 2, 2), TT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c(HC, TER), class = factor)), .Names = c(FY, ID, TT), class = data.frame, row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 9L, 10L, 11L)) dput(T_DF) structure(list(FY = structure(c(4L, 2L, 5L), .Label = c(FY09, FY10, FY11, FY12, FY13), class = factor), ID = c(1, 2, 2), TT = structure(c(2L, 2L, 2L), .Label = c(HC, TER), class = factor)), .Names = c(FY, ID, TT), row.names = c(5L, 8L, 12L), class = data.frame) Here is an sqldf solution: library(sqldf) sqldf(select FY, ID, coalesce(t.TT, h.TT) TT from H_DF h left join T_DF t using(FY, ID)) FY ID TT 1 FY09 1 HC 2 FY10 1 HC 3 FY11 1 HC 4 FY12 1 TER 5 FY09 2 HC 6 FY10 2 TER 7 FY11 2 HC 8 FY12 2 HC 9 FY13 2 TER -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Saving then Loading Objects/Models into existing workspace.
Hi R Experts, I need some advice on how to manage the number of models/objects I have in one workspace. Below is typically how I get started each time I begin or resume an analysis. But now I am storing multiple models which are built off of dataframes with dims of 30,000 x 60. I am anticipating running into RAM issues. I am running 64bit r, r version: 2.15.1, Windows 7 PC w/ 8GB of RM and processor: Intel Core2 DUO CPU E8400@3.00Ghzmailto:E8400@3.00Ghz Let's say I have models: M1 thru Mk. How do I save these separately and the load them as needed? I am picturing storing them in one file and then calling one or more from that one file as needed. I hope that makes sense. # GETTING STARTED--- #Clear current objects and workspace rm(list=ls()) #Set Working directory setwd() #LOAD RDATA and History load(FY14_RF_Model_Dan.RData) loadhistory(FY14_RF_Model_Dan.Rhistory) ls() #Once I'm done I SAVE save.image(FY14_RF_Model_Dan.RData) savehistory(FY14_RF_Model_Dan.RHistory) Dan Lopez LLNL,HRIM - Workforce Analytics Metrics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Elastic-R Webinar Invite
Hello R-Help Mailing List: Are you interested in running collaborative R analytics in the cloud? Join the The Knoxville R User Group and The Orange County R User Group for a free webinar on the Elastic-R software platform. Webinar Format: - Introduction to Elastic-R - Live demonstration of the Elastic-R platform on Amazon EC2 - Question and Answer period Registration and Information: https://www3.gotomeeting.com/register/318141670 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unable to install package xts
I am using Ubuntu 12.04 and unable to install xts. Here are the info: usr/bin/ld: cannot find -lgfortran collect2: error: ld returned 1 exit status make: *** [xts.so] Error 1 ERROR: compilation failed for package ‘xts’ * removing ‘/home/jasom/R/x86_64-pc-linux-gnu-library/3.0/xts’ Warning in install.packages : installation of package ‘xts’ had non-zero exit status The downloaded source packages are in ‘/tmp/RtmpVH1i1S/downloaded_packages’ sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.2 Thanks in advance. CY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting residual term out of lmer summary table
Hello I'm working with mixed effects models using lmer() and have some problems to get all variance components of the model's random effects. I can get the variance of the random effect out of the summary and use it for further calculations, but not the variance component of the residual term. Could somebody help me with that problem? Thanks a lot! Below an example. Aline ## EXAMPLE #-- require(lme4) ## Simulate data for the example set.seed(6) x1 - runif(n=100, min=10, max=100) ## a continuos variable x2 - runif(n=100, min=10, max=100) ## a continuos variable treat - rep(letters[1:4], times=25) ## a fixed factor with 4 levels treat.effect - 20*rep(1:4, times=25) group.label - rep(LETTERS[1:5], each=20) ## the random effect group.effect - 10*rep(1:5, each=20) ## there are 5 groups ## Response variable: y - 2*x1 + (-5)*x2 + treat.effect + group.effect + rnorm(100) ## Dataframe d.ex - data.frame(y, x1, x2, Group=group.label, treat) ## Apply model mod1 - lmer(y~x1+x2+treat+x1:treat+ (1|Group), data=d.ex) output - summary(mod1); output # ok, there is the variance component of the random effect group and the residual term ## Now I'd like to get the variance components of the random effect Group and of the residual term Residual in order # to do further calculations with these numbers output$varcor[1] ## reveals the variance of the random effect Group output$varcor[2] ## does not reveal the residual term! what other command do I need to use then? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply function to every 20 rows between pairs of columns in a matrix
HI, It's not very clear. set.seed(25) dat1 - as.data.frame(matrix(sample(c(A,T,G,C),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE) lst1 - split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1 res - lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56], function(z) sum(y==z)/20))) length(res) #[1] 2325 ### check here dim(res[[1]]) #[1] 48 8 A.K. Hi all, I have a set of genetic SNP data that looks like Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8 Sample1 Sample2 Sample3 Sample... A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T The size of the matrix is 56 columns by 46482 rows. I need to first bin the matrix by every 20 rows, then compare each of the first 8 columns (founders) to each columns 9-56, and divide the total number of matching letters/alleles by the total number of rows (20). Ultimately I need 48 8 column by 2342 row matrices, which are essentially similarity matrices. I have tried to extract each pair separately by something like length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]==T cbind(odd[,9],odd[,1])[,2]==T,])/nrow(cbind(odd[,9],odd[,1])) but this is no where near efficient, and I do not know of a faster way of applying the function to every 20 rows and across multiple pairs. In the example given above, if the rows were all identical like shown across 20 rows, then the first row of the matrix for Sample1 would be 1 1 1 0 0 0 0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply function to every 20 rows between pairs of columns in a matrix
Hi, May be this what you wanted. res2 - lapply(row.names(res[[1]]),function(x) do.call(rbind,lapply(res,function(y) y[match(x, row.names(y)),]))) length(res2) #[1] 48 dim(res2[[1]]) #[1] 2325 8 A.K. On Monday, November 11, 2013 10:20 PM, Yu-yu Ren renyan...@gmail.com wrote: Thank you so much for that script, it works great. One additional request; how can I go about binding each of the 2325 matrices for each sample, resulting in 48 matrices of 8 column by 2325 row? On Mon, Nov 11, 2013 at 10:02 PM, arun smartpink...@yahoo.com wrote: Hi, I already sent a reply to R-help. I am not sure about the 2342. set.seed(25) dat1 - as.data.frame(matrix(sample(c(A,T,G,C),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE) lst1 - split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1 res - lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56], function(z) sum(y==z)/20))) length(res) #[1] 2325 ### check here dim(res[[1]]) #[1] 48 8 A.K. On Monday, November 11, 2013 10:00 PM, Yu-yu Ren renyan...@gmail.com wrote: Thank you, I have uploaded several example files, with intermediate outputs of what I have done and the logic flow. On Mon, Nov 11, 2013 at 9:37 PM, smartpink...@yahoo.com wrote: Hi, Comparing the first 8 columns separately with 9-56 columns is not clear. Also, please provide a reproducible example (using ?dput) for others to work on. A.K. quote author='Renyulb28' Hi all, I have a set of genetic SNP data that looks like Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8 Sample1 Sample2 Sample3 Sample... A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T The size of the matrix is 56 columns by 46482 rows. I need to first bin the matrix by every 20 rows, then compare each of the first 8 columns (founders) to each columns 9-56, and divide the total number of matching letters/alleles by the total number of rows (20). Ultimately I need 48 8 column by 2342 row matrices, which are essentially similarity matrices. I have tried to extract each pair separately by something like length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]==T cbind(odd[,9],odd[,1])[,2]==T,])/nrow(cbind(odd[,9],odd[,1])) but this is no where near efficient, and I do not know of a faster way of applying the function to every 20 rows and across multiple pairs. In the example given above, if the rows were all identical like shown across 20 rows, then the first row of the matrix for Sample1 would be 1 1 1 0 0 0 0 /quote Quoted from: http://r.789695.n4.nabble.com/Apply-function-to-every-20-rows-between-pairs-of-columns-in-a-matrix-tp4680272.html _ Sent from http://r.789695.n4.nabble.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply function to every 20 rows between pairs of columns in a matrix
HI, set.seed(25) dat1 - as.data.frame(matrix(sample(c(A,T,G,C),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE) lst1 - split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1 res - lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56], function(z) sum(y==z)/20))) length(res) #[1] 2325 ### check here dim(res[[1]]) #[1] 48 8 A.K. Hi all, I have a set of genetic SNP data that looks like Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8 Sample1 Sample2 Sample3 Sample... A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T The size of the matrix is 56 columns by 46482 rows. I need to first bin the matrix by every 20 rows, then compare each of the first 8 columns (founders) to each columns 9-56, and divide the total number of matching letters/alleles by the total number of rows (20). Ultimately I need 48 8 column by 2342 row matrices, which are essentially similarity matrices. I have tried to extract each pair separately by something like length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]==T cbind(odd[,9],odd[,1])[,2]==T,])/nrow(cbind(odd[,9],odd[,1])) but this is no where near efficient, and I do not know of a faster way of applying the function to every 20 rows and across multiple pairs. In the example given above, if the rows were all identical like shown across 20 rows, then the first row of the matrix for Sample1 would be 1 1 1 0 0 0 0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unable to install package xts
Wang Chongyang wchongyang at gmail.com writes: I am using Ubuntu 12.04 and unable to install xts. Here are the info: usr/bin/ld: cannot find -lgfortran Do 'sudo apt-get install r-base-dev' to install a set of requirement for building packages, which includes among other things the Fortran library you are missing here. Dirk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Test for exogeneity
Hi, I am building a bivariate SVAR model y_1t=c_1+Ã_1 (1,1) y_(1,t-1)+Ã_1 (1,2) y_(2,t-1)+Ã_2 (1,1) y_(1,t-2)+Ã_2 (1,2) y_(2,t-2)+å_1t b y_1t+ y_2t=c_2+Ã_1 (2,1) y_(1,t-1)+Ã_1 (2,2) y_(2,t-1)+Ã_2 (2,1) y_(1,t-2)+Ã_2 (1,2) y_(2,t-2)+å_2t Now y1 is relatively exogenous in that y1 impacts y2 contemporaneously but not the other way around. Given a bivariate dataset, is there any statistical test (in any R package or elsewhere) that helps to justify/test the exogeneity of y1 in the present context? Is there any reference available? Thanks, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sourcing from 2 different computers R code
Hi, I have a piece of code sitting on a dropbox directory and haev installed R 3.0.2 on 2 machines: one MacBook Pro and one Sony Vaio pc. Now, when I use source(/Users/R) to call the script from the Mac no problems, but when I use source(C:\Users\...R) to call the script from the Sony Vaio I get the following: Error: '\U' used without hex digits in character string starting 'C:\U What am I doing wrong? Thanks in advance, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sourcing from 2 different computers R code
Hello, What is the result when you use source(C:/Users/...R)? Regards, Pascal On 12 November 2013 15:13, Luca Meyer lucam1...@gmail.com wrote: Hi, I have a piece of code sitting on a dropbox directory and haev installed R 3.0.2 on 2 machines: one MacBook Pro and one Sony Vaio pc. Now, when I use source(/Users/R) to call the script from the Mac no problems, but when I use source(C:\Users\...R) to call the script from the Sony Vaio I get the following: Error: '\U' used without hex digits in character string starting 'C:\U What am I doing wrong? Thanks in advance, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Pascal Oettli Project Scientist JAMSTEC Yokohama, Japan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unable to install package xts
Hello, You probably should install a Fortran compiler. Regards, Pascal On 12 November 2013 13:40, Wang Chongyang wchongy...@gmail.com wrote: I am using Ubuntu 12.04 and unable to install xts. Here are the info: usr/bin/ld: cannot find -lgfortran collect2: error: ld returned 1 exit status make: *** [xts.so] Error 1 ERROR: compilation failed for package xts * removing /home/jasom/R/x86_64-pc-linux-gnu-library/3.0/xts Warning in install.packages : installation of package xts had non-zero exit status The downloaded source packages are in /tmp/RtmpVH1i1S/downloaded_packages sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.2 Thanks in advance. CY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Pascal Oettli Project Scientist JAMSTEC Yokohama, Japan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sourcing from 2 different computers R code
This is not one but two FAQs: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-file-names-work-in-Windows_003f http://cran.r-project.org/bin/windows/base/rw-FAQ.html#R-can_0027t-find-my-file See the posting guide and the footer of this message. On 12/11/2013 06:13, Luca Meyer wrote: Hi, I have a piece of code sitting on a dropbox directory and haev installed R 3.0.2 on 2 machines: one MacBook Pro and one Sony Vaio pc. Now, when I use source(/Users/R) to call the script from the Mac no problems, but when I use source(C:\Users\...R) to call the script from the Sony Vaio I get the following: Error: '\U' used without hex digits in character string starting 'C:\U What am I doing wrong? Thanks in advance, Luca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.