Re: [R-es] La ejecución de mi script R es muy lenta
Hola Miguel Ángel, creo que Carlos Ortega me ha dado una solución a mi problema con R...voy a probarlo... No sabía que había esa limitación en el tamaño del email, lo tendré en cuenta para la próxima. Muchas gracias en cualquier caso Un saludo MªLuz Morales Dpto. Ciencias y Tecnología de la comunicación Universidad Europea de Madrid El 28 de mayo de 2015, 22:29, miguel.angel.rodriguez.mui...@sergas.es escribió: Hola Mª Luz. Tu primer mensaje no ha llegado a la lista precisamente por el tamaño de los ficheros adjuntos. Tienes un correo del administrador al respecto. Al haber comentado tú misma ese mensaje, lo hemos podido leer todos pero no tenemos acceso al fichero Set-A.zip y al Outcomes.csv. (creo recordar que eran unos 9Mb entre los dos) Podrías pensar en colgarlos en algún sitio (tipo DropBox o similar) y compartir la URL. En caso de que tengas problemas envíame un correo e intentaré ayudarte. Un Saludo, Miguel Rodríguez Consellería de Sanidade Xunta de Galicia http://dxsp.sergas.es De: R-help-es [r-help-es-boun...@r-project.org] en nombre de MªLuz Morales [mlzm...@gmail.com] Enviado: jueves, 28 de mayo de 2015 16:14 Para: Carlos Ortega CC: R-help-es@r-project.org Asunto: Re: [R-es] La ejecución de mi script R es muy lenta Hola, gracias por contestar tan rápido. En el email he adjuntado los archivos seta y outcomes.csv, no me queda claro como hacer para que podáis acceder a ellos de otra manera. El 28 de mayo de 2015, 15:53, Carlos Ortega c...@qualityexcellence.es escribió: Hola, Si no tienes inconveniente en compartir tu conjunto de datos (puedes dejarlo en un Dropbox y compartir enlace) o incluir una salida de la variables: seta y outcomes (función save.image()) con eso podemos darte alguna solución mucho más rápida que la que planteas. En tu código con un bucle estás tratando de rellenar una lista que son los diferentes agregados y esto se puede hacer mucho más rápido (segundos) con varios paquetes: data.table, dplyr y sqldf. Saludos, Carlos Ortega www.qualityexcellence.es El 28 de mayo de 2015, 15:34, javier.ruben.marcu...@gmail.com escribió: Estimada María Luz Morales Puedes intentar con data.table y reemplazar for por algina otra opción vectorizada, aunque en R moderno esto mejoró, y la posibilidad de compile debería ser evaluada. Javier Rubén Marcuzzi Técnico en Industrias Lácteas Veterinario De: MªLuz Morales Enviado el: jueves, 28 de mayo de 2015 10:21 a.m. Para: R-help-es@r-project.org En el correo anterior se me olvidó mencionar que trabajo con Rstudio El 28 de mayo de 2015, 15:18, MªLuz Morales mlzm...@gmail.com escribió: Hola, soy nueva en esta lista y también en R. Yo he realizado un script en R que carga dos archivos csv, uno de ellos con casi 2 millones de filas. El programa carga esos archivos a data frame, y se trata simplemente de seleccionar ciertos datos, hacer alguna operación (media, minimo, máximo) y presentarlos en una tabla que tendrá 4000 filas. La ejecución de este programa ha tardado casi 3 horas, podéis decirme si R es lento en esta operación o es que mi código no está optimizado y no estoy haciéndolo de la forma correcta. El código de mi programa es el siguiente: #+++ ## Set-A.csv y Outcomes.csv deben estar en el directorio actual # Transforma csv a data frame seta - read.csv('Set-A.csv'); outcomes - read.csv('Outcomes-A.csv'); ids - as.character(unique(outcomes$RecordID)); ## Número de RecordsID distintos Length_ids - length(ids); #número de RecordsID distintos ListaABP - list('RecordID'=-1,'SAPS.I'=-1, 'SOFA'=-1, 'Survival'=-1, 'In.hospital_death'=-1, 'NISysABP_Min'=-1,'NISysABP_Max'=-1, 'NISysABP_Mean'=-1, 'NIDiasABP_Min'=-1,'NIDiasABP_Max'=-1, 'NIDiasABP_Mean'=-1,'NIMAP_Min'=-1,'NIMAP_Max'=-1, 'NIMAP_Mean'=-1); for (i in 1:Length_ids){#NumRecordID){ # Para cada paciente... ListaABP$RecordID[i] - outcomes$RecordID[i]; ListaABP$SAPS.I[i] - outcomes$SAPS.I[i]; ListaABP$SOFA[i] - outcomes$SOFA[i]; ListaABP$Survival[i] - outcomes$Survival[i]; ListaABP$In.hospital_death[i] - outcomes$In.hospital_death[i]; # Parameter == 'NISysBP' #seta_NISysABP - seta[seta$RecordID == ids[i] seta$Parameter == 'NISysABP' , c('RecordID','Value')] ; seta_NISysABP - seta[seta$RecordID == ids[i] seta$Parameter == 'NISysABP' , 'Value'] ; #Creo que esto ya no sería un dataframe, por lo que en la siguiente línea puede dar error ListaABP$NISysABP_Min[i] - min(seta_NISysABP); ListaABP$NISysABP_Max[i] - max(seta_NISysABP); ListaABP$NISysABP_Mean[i] - mean(seta_NISysABP); # Parameter == 'NIDiasABP' #seta_NIDiasABP - seta[seta$RecordID
[R] Error in CSV file
Hello All, This is an easy fix but I am not able to find the root cause of the error. I am trying to upload a csv file but it is throwing an error. Have done a lot of research on google and some tutorial but cant find a solution hence please advice:- Syntax is :- aaa-read.csv(file =VehicleData.csv,Header=TRUE) Error:- Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (Header = TRUE) Snapshot of the file:- Weight Hours PROCESS Month Weekday Day 6828 13 INBOUND Mar Fri 13 2504 16 INBOUND Mar Fri 27 20 16 INBOUND Mar Fri 27 1026216 INBOUND Mar Fri 27 2500 17 INBOUND Mar Fri 13 Kindly help. -- View this message in context: http://r.789695.n4.nabble.com/Error-in-CSV-file-tp4707879.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analysis of variance test
Dear Nezahat In future it would be helpful if you 1 - gave us the data so we can reproduce what you are doing 2 - told us what the error was in case we cannot replicate ti 3 - did not post in HTML as it messes up everything in your post What did you think x1 - numeric was going to do? Try x1 - numeric str(x1) On 28/05/2015 22:16, Nezahat HUnter wrote: Let's say I have 12 observation of 5 variables and my first variable is categorical (with 4 different levels). I am trying to find out statistical significance difference between these categorical levels for each variable, but my function is not working! Please note that my data x are in data.frame format. Any suggestion would be helpful.Many thanks. function(x) { x1 - numeric x2 - numeric for(i in 2:length(x)) { x1[i] - summary(aov(x[, i] ~ factor(x[, 1]))) x2[i] - x1[i]$Pr[1] #Pr is the probability values if(x2[i] 0.06) x2[i] - 1 else x2[i] - 0 } x2 } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help on R Functionality Histogram
Hello Experts, I have couple of questions on the analysis I am creating. 1) How does R adopt to changes. The case I have here is that the excel I have started initially had to be modified because the data I had was on hourly basis ranging from 0 to 23 hours. After Changes 0 was modified to 24 in hours. Now do I need to recall this excel again in R using read.csv syntax or is there another way to do so i.e. a kind of reload option 2) I am creating a histogram. I need on x axis 24 hours to be displayed separately as 0,1,2, and thereon. However it only shows till 20 which makes the look awkward. Also all l need to resize the labels and if possible inside the bars. It used the below code, axis fonts have changed but labels give an error with this code Code:- hist(aaa$Hours,main=Hourly Weight,xlab = Time,breaks = 25,col = yellow,ylim = c(0,9000), labels=TRUE, cex.axis=0.6,cex.label=0.6) Kindly advice on the both the questions. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Help-on-R-Functionality-Histogram-tp4707887.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analysis of variance test
Hi Nezahat, First, you are storing the code of the function numeric in x1 and x2. You probably want to use: x1-numeric() x2-numeric() Second, you are then storing the output of your aov summary (a list) in x1, which requires a bit of analysis to get the information you want (i.e. p value). The following will work for your example, but is not a general solution. nh_fun-function(x) { pvals -numeric() for(i in 2:length(x)) pvals[i-1]-unlist(summary(aov(x[,i] ~ factor(x[,1])))[[1]][5])[1] = 0.05 return(pvals) } nh_fun(x) As you probably want to get the conventional =0.05, I have changed the criterion. If you want to understand why the mess of extractors appears after the summary call, use the str function successively on the return value from summary Jim On Fri, May 29, 2015 at 7:16 AM, Nezahat HUnter nezahathun...@yahoo.co.uk wrote: Let's say I have 12 observation of 5 variables and my first variable is categorical (with 4 different levels). I am trying to find out statistical significance difference between these categorical levels for each variable, but my function is not working! Please note that my data x are in data.frame format. Any suggestion would be helpful.Many thanks. function(x) { x1 - numeric x2 - numeric for(i in 2:length(x)) { x1[i] - summary(aov(x[, i] ~ factor(x[, 1]))) x2[i] - x1[i]$Pr[1] #Pr is the probability values if(x2[i] 0.06) x2[i] - 1 else x2[i] - 0 } x2 } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to make new predictions from a GAM with a spline forced through the origin
Hi, I’m followed an example to fit a GAM with a spline forced through a point, i.e. (0,0). This works fine from one of Simon’s examples however when it comes to making a prediction from a new set of x values I’m a bit stumped. In the example below a smooth term is constructed and the basis and penalties at x=0 are removed then the gam is fitted to a spline basis matrix X using spline penalties. Can someone suggest a way that I can make predictions at new x values based on the gam b below. Here is Simon Wood's example: library(mgcv) set.seed(0) n - 100 x - runif(n)*4-1;x - sort(x); f - exp(4*x)/(1+exp(4*x));y - f+rnorm(100)*0.1;plot(x,y) dat - data.frame(x=x,y=y) ## Create a spline basis and penalty, making sure there is a knot ## at the constraint point, (0 here, but could be anywhere) knots - data.frame(x=seq(-1,3,length=9)) ## create knots ## set up smoother... sm - smoothCon(s(x,k=9,bs=cr),dat,knots=knots)[[1]] ## 3rd parameter is value of spline at knot location 0, ## set it to 0 by dropping... X - sm$X[,-3]## spline basis S - sm$S[[1]][-3,-3] ## spline penalty off - y*0 + .6 ## offset term to force curve through (0, .6) ## fit spline constrained through (0, .6)... b - gam(y ~ X - 1 + offset(off),paraPen=list(X=list(S))) lines(x,predict(b)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help on R Functionality Histogram
Hello Experts, I have couple of questions on the analysis I am creating. 1) How does R adopt to changes. The case I have here is that the excel I have started initially had to be modified because the data I had was on hourly basis ranging from 0 to 23 hours. After Changes 0 was modified to 24 in hours. Now do I need to recall this excel again in R using read.csv syntax or is there another way to do so i.e. a kind of reload option 2) I am creating a histogram. I need on x axis 24 hours to be displayed separately as 0,1,2, and thereon. However it only shows till 20 which makes the look awkward. Also all l need to resize the labels and if possible inside the bars. It used the below code, axis fonts have changed but labels give an error with this code Code:- hist(aaa$Hours,main=Hourly Weight,xlab = Time,breaks = 25,col = yellow,ylim = c(0,9000), labels=TRUE, cex.axis=0.6,cex.label=0.6) Kindly advice on the both the questions. Thanks. Histogram.png http://r.789695.n4.nabble.com/file/n4707886/Histogram.png -- View this message in context: http://r.789695.n4.nabble.com/Help-on-R-Functionality-Histogram-tp4707886.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in CSV file
Shivi82 shivibha...@ymail.com writes: Hello All, This is an easy fix but I am not able to find the root cause of the error. I am trying to upload a csv file but it is throwing an error. Have done a lot of research on google and some tutorial but cant find a solution hence please advice:- Syntax is :- aaa-read.csv(file =VehicleData.csv,Header=TRUE) Error:- Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (Header = TRUE) ^^ use header = TRUE instead of Header = TRUE. R is case sensitive. Cheers, Rainer Snapshot of the file:- WeightHours PROCESS Month Weekday Day 6828 13 INBOUND Mar Fri 13 2504 16 INBOUND Mar Fri 27 20 16 INBOUND Mar Fri 27 10262 16 INBOUND Mar Fri 27 2500 17 INBOUND Mar Fri 13 Kindly help. -- View this message in context: http://r.789695.n4.nabble.com/Error-in-CSV-file-tp4707879.html Sent from the R help mailing list archive at Nabble.com. -- Rainer M. Krug email: Raineratkrugsdotde PGP: 0x0F52F982 signature.asc Description: PGP signature __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in CSV file
Hi Shivi, R is case sensitive and the error message that the argument Header is unused (because unrecognized). Try with header (lower case h) and it should work. HTH, Ivan -- Ivan Calandra, ATER University of Reims Champagne-Ardenne GEGENAA - EA 3795 CREA - 2 esplanade Roland Garros 51100 Reims, France +33(0)3 26 77 36 89 ivan.calan...@univ-reims.fr https://www.researchgate.net/profile/Ivan_Calandra Le 29/05/15 10:41, Shivi82 a écrit : Hello All, This is an easy fix but I am not able to find the root cause of the error. I am trying to upload a csv file but it is throwing an error. Have done a lot of research on google and some tutorial but cant find a solution hence please advice:- Syntax is :- aaa-read.csv(file =VehicleData.csv,Header=TRUE) Error:- Error in read.table(file = file, header = header, sep = sep, quote = quote, : unused argument (Header = TRUE) Snapshot of the file:- Weight Hours PROCESS Month Weekday Day 6828 13 INBOUND Mar Fri 13 2504 16 INBOUND Mar Fri 27 20 16 INBOUND Mar Fri 27 1026216 INBOUND Mar Fri 27 2500 17 INBOUND Mar Fri 13 Kindly help. -- View this message in context: http://r.789695.n4.nabble.com/Error-in-CSV-file-tp4707879.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in CSV file
This ate my head like for 2 hours. God thanks for the help. -- View this message in context: http://r.789695.n4.nabble.com/Error-in-CSV-file-tp4707879p4707882.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on R Functionality Histogram
On Fri, May 29, 2015 at 7:53 AM, Shivi82 shivibha...@ymail.com wrote: Hello Experts, I have couple of questions on the analysis I am creating. 1) How does R adopt to changes. The case I have here is that the excel I have started initially had to be modified because the data I had was on hourly basis ranging from 0 to 23 hours. After Changes 0 was modified to 24 in hours. Now do I need to recall this excel again in R using read.csv syntax or is there another way to do so i.e. a kind of reload option Using read.csv() is the reload option. R has no automatic interface to external files. 2) I am creating a histogram. I need on x axis 24 hours to be displayed separately as 0,1,2, and thereon. However it only shows till 20 which makes the look awkward. Also all l need to resize the labels and if possible inside the bars. It used the below code, axis fonts have changed but labels give an error with this code Code:- hist(aaa$Hours,main=Hourly Weight,xlab = Time,breaks = 25,col = yellow,ylim = c(0,9000), labels=TRUE, cex.axis=0.6,cex.label=0.6) The most understandable approach is to break it down into chunks: Create the histogram. Add a custom axis. Add custom labels. # using fake data aaa - data.frame(Hours = sample(1:24, 1, replace=TRUE)) aaa.hist - hist(aaa$Hours, main=Hourly Weight, xlab = Time, breaks = seq(0, 24), col = yellow, ylim = c(0,9000), cex.axis=0.6, xaxt=n) axis(1, (0:23)+.5, 1:24, cex.axis=.6) text((0:23)+.5, aaa.hist$counts-150, aaa.hist$counts, cex=.6) Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on R Functionality Histogram
Thanks Sarah. This is magical. Thanks for explaining in such a length. -- View this message in context: http://r.789695.n4.nabble.com/Help-on-R-Functionality-Histogram-tp4707886p4707891.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] An Odd Request
Hello R-Users I apologize in advance if my post is inappropriate. I read the entire posting guide and found nothing to say so, but you never know. I am seeking a knowledgable R-user that might be interested (for whatever reason) in helping out on what I hope would be considered a worthy project. I am a research scientist, albeit one with little programming ability. I recently started a website which allows patients of different sorts to suggest research studies. Everything is completely free and anonymous. When several members express interest in a particular idea I attempt to build it so they can actually run through the study. Clearly there are limits but we currently we have 4 communities, chronic fatigue syndrome, fibromyalgia, multiple sclerosis and pernicious anaemia and there are several active studies in which people are submitting data every day. It's quite exciting and I think it has great potential to help people, particularly with disorders that have defied explanation. I'm currently using google spreadsheets/forms to create symptom trackers and interactive dashboards of the results which (most of the time) show group results by default but which can show individual results if an ID is entered. Unfortunately google spreadsheets is a little limited and I now require the use of more complicated stats such as linear mixed models. I know that I need to move to R, I understand the basics of running statistical tests with packages such as LMER, but I have no clue how to go about integrating such analyses into a website. I could certainly learn how, would love to, and ultimately will, but if someone was interested in joining me in this endeavour much more could be accomplished. If you're interested in knowing more let me know. Josh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] An Odd Request
If you are primarily interested in making your R analyses in to a website you should look in to the 'Shiny' package. It makes generating web pages very easy. Here is a link to the Shiny Gallery providing some examples ( http://shiny.rstudio.com/gallery/). Regards, Charles On Fri, May 29, 2015 at 7:48 AM, Josh Grant myencepha...@gmail.com wrote: Hello R-Users I apologize in advance if my post is inappropriate. I read the entire posting guide and found nothing to say so, but you never know. I am seeking a knowledgable R-user that might be interested (for whatever reason) in helping out on what I hope would be considered a worthy project. I am a research scientist, albeit one with little programming ability. I recently started a website which allows patients of different sorts to suggest research studies. Everything is completely free and anonymous. When several members express interest in a particular idea I attempt to build it so they can actually run through the study. Clearly there are limits but we currently we have 4 communities, chronic fatigue syndrome, fibromyalgia, multiple sclerosis and pernicious anaemia and there are several active studies in which people are submitting data every day. It's quite exciting and I think it has great potential to help people, particularly with disorders that have defied explanation. I'm currently using google spreadsheets/forms to create symptom trackers and interactive dashboards of the results which (most of the time) show group results by default but which can show individual results if an ID is entered. Unfortunately google spreadsheets is a little limited and I now require the use of more complicated stats such as linear mixed models. I know that I need to move to R, I understand the basics of running statistical tests with packages such as LMER, but I have no clue how to go about integrating such analyses into a website. I could certainly learn how, would love to, and ultimately will, but if someone was interested in joining me in this endeavour much more could be accomplished. If you're interested in knowing more let me know. Josh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with nls
Can some help me with a question on this bass model, please As I read some articles on this topic, I understand that 1. the bass formula is N(t) = pm + (q-p) N(t-1) - (q/m) (N(t-1))^2 2. which is a difference equation with the solution N(t) = m (1 − exp(−(p+q)t)) / (1 + (q/p)exp(−(p+q)t)) 3. So, using a linear regression would give us some some initial estimations for the parameters m, p, q 4. we then can put the initial estimations into a NLS to get the better estimations Am I right? Now the question is, why is that I see people use cumulative data and try to fit it into a pdf as M * ( ((P+Q)^2 / P) * exp(-(P+Q) * T79) ) / (1+(Q/P)*exp(-(P+Q)*T79))^2, why not using the cumulative data and fit directly the N(t) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with comparing multiple data sets
Hi everyone. I tried the (modeest) package on my initial test data and it worked. However, it doesn't work on the entire data set. I saved one of the protions that gives error. (Not for all of the values but for some of them). For example: lines 36 and 37 and 39 correctly show the mode value but 38 and 40 are not correct. Such error is repeated for many of the values. [36,] 2 [37,] 2 [38,] Numeric,3 [39,] 1 [40,] Numeric,3 #This is what I did: df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,) Out- apply(df[,2:length(df)],1, mfv) t(t(Out)) #This is the data set structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access control, #privacy,personal data, #security,malicious,security, data controller, id management,security, password,recovery), class = factor), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -50L)) also when I try to include the terms to the result it gives me an error: mode.names- data.frame (df[,1],Out) Error in data.frame(df[, 1], Out) : arguments imply differing number of rows: 50, 3 On Thu, May 28, 2015 at 9:24 AM, Mohammad Alimohammadi mxalimoha...@ualr.edu wrote: Thank you David for your help ! On Wed, May 27, 2015 at 7:31 PM, David L Carlson dcarl...@tamu.edu wrote: cat(paste0([, 1:length(Out), ] #dac , Out), sep=\n) David *From:* Mohammad Alimohammadi [mailto:mxalimoha...@ualr.edu] *Sent:* Wednesday, May 27, 2015 2:29 PM *To:* David L Carlson; r-help@r-project.org *Subject:* Re: [R] Problem with comparing multiple data sets Thanks David it worked ! One more thing. I hope it's not complicated. Is it also possible to display the terms for each row next to it? for example: [1] #dac2 [2] #dac0 [3] #dac1 ... On Wed, May 27, 2015 at 2:18 PM, David L Carlson dcarl...@tamu.edu wrote: Save the result of the apply() function: Out - apply(df[ ,2:length(df)], 1, mfv) Then there are several options: Approximately what you asked for data.frame(Out) t(t(Out)) More typing but exactly what you asked for cat(paste0([, 1:length(Out), ] , Out), sep=\n) David L. Carlson Department of Anthropology Texas AM University -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mohammad Alimohammadi Sent: Wednesday, May 27, 2015 1:47 PM To: John Kane; r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Ok. so I read about the (modeest) package that gives the results that I am looking for (most repeated value). I modified the data frame a little and moved the text to the first column. This is the data frame with all 3 possible classes for each term. = structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -49L)) = #Then I applied the function below: == library(modeest) df- read.csv(file=short.csv,
Re: [R] best way to handle database connections from within a package
I would simply separate the database connect and disconnect functions from the query functions. Mark R. Mark Sharp, Ph.D. msh...@txbiomed.org On May 28, 2015, at 12:18 PM, Luca Cerone luca.cer...@gmail.com wrote: Dear all, I am writing a package that is a collection of queries to be run against a postgresql database, so that the users do not have to worry about the structure of the database. In my package I import dbDriver, dbUnloadDriver, dbConnect, dbDisconnect from the package DBI and dbGetQuery from the package RPostgreSQL. All the function in a function in my package have the same structure: getFancyData - function( from, to) { on.exit( dbDisconnect(con), add=TRUE) on.exit( dbUnloadDriver(drv), add=TRUE) drv - dbDriver(PostgreSQL) con - dbConnect(drv, user=pkguser, host=pkghost, password=pkgpassword, port = pkgport) query - sprintf(select * from fancyTable where dt between '%s' and '%s', from, to) res - dbGetQuery(con,query) return(res) } The various access details are read from an encrypted profile that the user has to create when she installs the package. Such functions work perfectly fine, but I have to replicate a lot of times loading and unloading the driver and connecting and disconnecting from the database. I am wondering if there is a better way to do this job, like loading the driver and opening the connection only once when the package is loaded. However I have to make sure that if R crashes or the code where the function is called contains an error then the connection with the database is closed. How would you implement this? Also how would you write a functional that would at least allow me to avoid replicating the boilerplate code to load and unload the drivers? I am thinking something on the lines of: querybuild - function(query, ) on.exit( dbDisconnect(con), add=TRUE) on.exit( dbUnloadDriver(drv), add=TRUE) query - sprintf(query, ... ) res - dbSendQuery(query) return(res) } and then define getFancyData - function(from, to) querybuild(select * from fancyTable where dt between '%s' and '%s', from, to) Do you see a better way? Thanks a lot in advance for your help and advice on this! Cheers, Luca __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] alternatives to KS test applicable to K-samples
Good morning, All I have a stat question not specifically related to the the programming language. To compare distributional consistency / discrepancy between two samples, we usually use kolmogorov-smirnov test, which is implemented in R with ks.test() or in SAS with pro npar1way edf. I am wondering if there is any alternative to KS test that could be generalized to K-samples. Thanks and have a nice weekend. wensui __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why I am not able to load library(R.matlab)? Other packages are fine.
Hi Henrik, I don't quite get what I should do here. I am not familiar with R.methodS3. Can you tell me what command exactly do I need to do? Thanks, Mike On Thu, May 28, 2015 at 3:30 PM, Henrik Bengtsson henrik.bengts...@ucsf.edu wrote: For some unknown reason, you've managed to install R.matlab without the dependency R.methodsS3 (cf. http://cran.r-project.org/web/packages/R.matlab/) or it happened due to some other glitch somewhere. Try to reinstall R.matlab. If that doesn't help, explicitly install R.methodsS3 and retry. If you get the same error with the other dependencies (R.oo and R.utils), do the same. /Henrik On Thu, May 28, 2015 at 11:47 AM, C W tmrs...@gmail.com wrote: Dear R list, I am trying to do use the R.matlab library, I did the following, but it does not work. library(R.matlab) Error in loadNamespace(j - i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘R.methodsS3’ Error: package or namespace load failed for ‘R.matlab’ This is my session info. sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.3 (Yosemite) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base My R is up-to-date, R 3.2.0. Why is this happening? Is it because I installed the new R version, instead of updating it? Maybe things are in a different directory? Thanks so much, Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] alternatives to KS test applicable to K-samples
Wensui: There are the multi-response permutation procedures (MRPP) that readily test the omnibus hypothesis of no distributional differences among multiple samples for univariate or multivariate responses. There also are empirical coverage tests that test a similar hypothesis among multiple samples but only for univariate responses. Both are included in the USGS Blossom package for R linked here: https://www.fort.usgs.gov/products/23735 (not yet distributed via CRAN). The MRPP may also be available in other R packages on CRAN (vegan ?). Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: ca...@usgs.gov brian_c...@usgs.gov tel: 970 226-9326 On Fri, May 29, 2015 at 10:31 AM, Wensui Liu liuwen...@gmail.com wrote: Good morning, All I have a stat question not specifically related to the the programming language. To compare distributional consistency / discrepancy between two samples, we usually use kolmogorov-smirnov test, which is implemented in R with ks.test() or in SAS with pro npar1way edf. I am wondering if there is any alternative to KS test that could be generalized to K-samples. Thanks and have a nice weekend. wensui __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting unique strings to unique numbers
Hi Kate, I found that matching the character vector to itself is a very effective way to do this: x - c(a, bunch, of, strings, whose, exact, content, is, of, little, interest) ids - match(x, x) ids # [1] 1 2 3 4 5 6 7 8 3 10 11 By using this trick, many manipulations on character vectors can be replaced by manipulations on integer vectors, which are sometimes way more efficient. Cheers, H. On 05/29/2015 09:58 AM, Kate Ignatius wrote: I have a pedigree file as so: X0001 BYX859 0 0 2 1 BYX859 X0001 BYX894 0 0 1 1 BYX894 X0001 BYX862 BYX894 BYX859 2 2 BYX862 X0001 BYX863 BYX894 BYX859 2 2 BYX863 X0001 BYX864 BYX894 BYX859 2 2 BYX864 X0001 BYX865 BYX894 BYX859 2 2 BYX865 And I was hoping to change all unique string values to numbers. That is: BYX859 = 1 BYX894 = 2 BYX862 = 3 BYX863 = 4 BYX864 = 5 BYX865 = 6 But only in columns 2 - 4. Essentially I would like the data to look like this: X0001 1 0 0 2 1 BYX859 X0001 2 0 0 1 1 BYX894 X0001 3 2 1 2 2 BYX862 X0001 4 2 1 2 2 BYX863 X0001 5 2 1 2 2 BYX864 X0001 6 2 1 2 2 BYX865 Is this possible with factors? Thanks! K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting unique strings to unique numbers
I found this helpful. However - the second to forth columns come out all zero - was this the intention? That is: X0001 0 0 0 2 1 BYX859 X0001 0 0 0 1 1 BYX894 X0001 0 0 0 2 2 BYX862 X0001 0 0 0 2 2 BYX863 X0001 0 0 0 2 2 BYX864 X0001 0 0 0 2 2 BYX865 On Fri, May 29, 2015 at 1:31 PM, William Dunlap wdun...@tibco.com wrote: match() will do what you want. E.g., run your data through the following function. f - function (data) { uniqStrings - unique(c(data[, 2], data[, 3], data[, 4])) uniqStrings - setdiff(uniqStrings, 0) for (j in 2:4) { data[[j]] - match(data[[j]], uniqStrings, nomatch = 0L) } data } Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, May 29, 2015 at 9:58 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I have a pedigree file as so: X0001 BYX859 0 0 2 1 BYX859 X0001 BYX894 0 0 1 1 BYX894 X0001 BYX862 BYX894 BYX859 2 2 BYX862 X0001 BYX863 BYX894 BYX859 2 2 BYX863 X0001 BYX864 BYX894 BYX859 2 2 BYX864 X0001 BYX865 BYX894 BYX859 2 2 BYX865 And I was hoping to change all unique string values to numbers. That is: BYX859 = 1 BYX894 = 2 BYX862 = 3 BYX863 = 4 BYX864 = 5 BYX865 = 6 But only in columns 2 - 4. Essentially I would like the data to look like this: X0001 1 0 0 2 1 BYX859 X0001 2 0 0 1 1 BYX894 X0001 3 2 1 2 2 BYX862 X0001 4 2 1 2 2 BYX863 X0001 5 2 1 2 2 BYX864 X0001 6 2 1 2 2 BYX865 Is this possible with factors? Thanks! K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] alternatives to KS test applicable to K-samples
Very nice, Brian Sincerely appreciate your assistance! On Friday, May 29, 2015, Cade, Brian ca...@usgs.gov wrote: Wensui: There are the multi-response permutation procedures (MRPP) that readily test the omnibus hypothesis of no distributional differences among multiple samples for univariate or multivariate responses. There also are empirical coverage tests that test a similar hypothesis among multiple samples but only for univariate responses. Both are included in the USGS Blossom package for R linked here: https://www.fort.usgs.gov/products/23735 (not yet distributed via CRAN). The MRPP may also be available in other R packages on CRAN (vegan ?). Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: ca...@usgs.gov javascript:_e(%7B%7D,'cvml','brian_c...@usgs.gov'); tel: 970 226-9326 On Fri, May 29, 2015 at 10:31 AM, Wensui Liu liuwen...@gmail.com javascript:_e(%7B%7D,'cvml','liuwen...@gmail.com'); wrote: Good morning, All I have a stat question not specifically related to the the programming language. To compare distributional consistency / discrepancy between two samples, we usually use kolmogorov-smirnov test, which is implemented in R with ks.test() or in SAS with pro npar1way edf. I am wondering if there is any alternative to KS test that could be generalized to K-samples. Thanks and have a nice weekend. wensui __ R-help@r-project.org javascript:_e(%7B%7D,'cvml','R-help@r-project.org'); mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- == WenSui Liu Credit Risk Manager, 53 Bancorp wensui@53.com 513-295-4370 == [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting unique strings to unique numbers
Of course, but I would not recommend it. A factor is a vector of integers with an attribute containing the labels that those integers correspond to. You seem to be asking for a factor that has lost the definitions part. But hey, newvector - as.integer(factor(oldvector)) should get you what you asked for one column at a time. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On May 29, 2015 9:58:22 AM PDT, Kate Ignatius kate.ignat...@gmail.com wrote: I have a pedigree file as so: X0001 BYX859 0 0 2 1 BYX859 X0001 BYX894 0 0 1 1 BYX894 X0001 BYX862 BYX894 BYX859 2 2 BYX862 X0001 BYX863 BYX894 BYX859 2 2 BYX863 X0001 BYX864 BYX894 BYX859 2 2 BYX864 X0001 BYX865 BYX894 BYX859 2 2 BYX865 And I was hoping to change all unique string values to numbers. That is: BYX859 = 1 BYX894 = 2 BYX862 = 3 BYX863 = 4 BYX864 = 5 BYX865 = 6 But only in columns 2 - 4. Essentially I would like the data to look like this: X0001 1 0 0 2 1 BYX859 X0001 2 0 0 1 1 BYX894 X0001 3 2 1 2 2 BYX862 X0001 4 2 1 2 2 BYX863 X0001 5 2 1 2 2 BYX864 X0001 6 2 1 2 2 BYX865 Is this possible with factors? Thanks! K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] Mi script R es muy lento
Hola MªLuz, no se si es el mas rapido de todas las opciones que existen, pero es muy muy rapido y el mas rapido que yo he usaado ... y es bastante practico para realizar operaciones complejas con tablas, aunque hay algunas cosas que no he sabido pasar de data.frames y bucles a data.table, pero la verdad pienso que es mi falta y que lo mas probable es que se pueda. Saludos, eric. On 5/29/15, MªLuz Morales mlzm...@gmail.com wrote: Hola, quiero compartir con vosotros mi problema y la solución que me han planteado. Mi programa carga Outcomes.csv y Set-A.csv (descargados de http://garrickadenbuie.com/blog/2013/04/11/visualize-physionet-data-with-r/, apartado Getting Started -- the code and the data set) de unos 50MB entre los dos. Mi código era: # Transforma csv a data frame seta - read.csv('Set-A.csv'); outcomes - read.csv('Outcomes-A.csv'); ids - as.character(unique(outcomes$RecordID)); ## Número de RecordsID distintos Length_ids - length(ids); #número de RecordsID distintos ListaABP - list('RecordID'=-1,'SAPS.I'=-1, 'SOFA'=-1, 'Survival'=-1, 'In.hospital_death'=-1, 'NISysABP_Min'=-1,'NISysABP_Max'=-1, 'NISysABP_Mean'=-1, 'NIDiasABP_Min'=-1,'NIDiasABP_Max'=-1, 'NIDiasABP_Mean'=-1,'NIMAP_Min'=-1,'NIMAP_Max'=-1, 'NIMAP_Mean'=-1); for (i in 1:Length_ids){#NumRecordID){ # Para cada paciente... ListaABP$RecordID[i] - outcomes$RecordID[i]; ListaABP$SAPS.I[i] - outcomes$SAPS.I[i]; ListaABP$SOFA[i] - outcomes$SOFA[i]; ListaABP$Survival[i] - outcomes$Survival[i]; ListaABP$In.hospital_death[i] - outcomes$In.hospital_death[i]; # Parameter == 'NISysBP' #seta_NISysABP - seta[seta$RecordID == ids[i] seta$Parameter == 'NISysABP' , c('RecordID','Value')] ; seta_NISysABP - seta[seta$RecordID == ids[i] seta$Parameter == 'NISysABP' , 'Value'] ; #Creo que esto ya no sería un dataframe, por lo que en la siguiente línea puede dar error ListaABP$NISysABP_Min[i] - min(seta_NISysABP); ListaABP$NISysABP_Max[i] - max(seta_NISysABP); ListaABP$NISysABP_Mean[i] - mean(seta_NISysABP); # Parameter == 'NIDiasABP' #seta_NIDiasABP - seta[seta$RecordID == ids[i] seta$Parameter == 'NIDiasABP' , c('Time','Value')] ; #En este caso la forma de hacer el min sería ...min(seta_NIDiasABP$Value); seta_NIDiasABP - seta[seta$RecordID == ids[i] seta$Parameter == 'NIDiasABP' , 'Value'] ; ListaABP$NIDiasABP_Min[i] - min(seta_NIDiasABP); ListaABP$NIDiasABP_Max[i] - max(seta_NIDiasABP); ListaABP$NIDiasABP_Mean[i] - mean(seta_NIDiasABP); # Parameter == 'NIMAP' #seta_NIMAP - seta[seta$RecordID == ids[i] seta$Parameter == 'NIMAP' , c('Time','Value')] ; seta_NIMAP - seta[seta$RecordID == ids[i] seta$Parameter == 'NIMAP' , 'Value'] ; ListaABP$NIMAP_Min[i] - min(seta_NIMAP); ListaABP$NIMAP_Max[i] - max(seta_NIMAP); ListaABP$NIMAP_Mean[i] - mean(seta_NIMAP); }#for i Tabla - data.frame(ListaABP); #+ Este código tardaba 3 horas en ejecutarse. La solución que me han propuesto es usar data.table en lugar de data.frame y ahora tarda 1 segundo aproximadamente en ejecutarse y es este: #- library(data.table) datSet - fread(Set-A.csv) resOut - datSet[, .(ValMax=max(Value), ValMin=min(Value), ValAvg=mean(Value)), by=c(RecordID,Parameter)] resOut$RecordID - as.factor(resOut$RecordID) setkey(resOut, RecordID) head(datSet) datOutcome - fread(Outcomes-A.csv) datOutcome$RecordID - as.factor(datOutcome$RecordID) setkey(datOutcome, RecordID) head(datOutcome) #resEnd - merge(resOut, datOutcome, by=RecordID, all=TRUE, allow.cartesian=FALSE) resEnd - resOut[datOutcome] head(resEnd) setkey(resEnd, Parameter) #Ejemplo para conseguir uno o varios parametros. myRes - resEnd[c(NISysABP,NIDiasABP,NIMAP)] head(myRes) #-- Tengo una pregunta, data.table es lo más eficiente para procesar grandes cantidades de datos?, es fácil de manejar si quieres realizar cálculos complejos además de reorganizar tablas...?? Gracias Un saludo [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es -- Nota: las tildes se han omitido para evitar conflictos con algunos lectores de correo. Frases notables: * SATYÂT NÂSTI PARO DHARMAH (No hay religion mas elevada que la verdad) * La oscuridad no se combate, se ilumina ... * Un economista es un experto que sabrá mañana por qué las cosas que predijo ayer no han sucedido hoy (Laurence Peter). ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R] Why I am not able to load library(R.matlab)? Other packages are fine.
C W tmrsg11 at gmail.com writes: Hi Henrik, I don't quite get what I should do here. I am not familiar with R.methodS3. Can you tell me what command exactly do I need to do? Thanks, Mike install.packages(R.methodsS3) install.packages(R.matlab) library(R.matlab) [snip snip snip] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] alternatives to KS test applicable to K-samples
On May 29, 2015, at 9:31 AM, Wensui Liu wrote: Good morning, All I have a stat question not specifically related to the the programming language. To compare distributional consistency / discrepancy between two samples, we usually use kolmogorov-smirnov test, which is implemented in R with ks.test() or in SAS with pro npar1way edf. I am wondering if there is any alternative to KS test that could be generalized to K-samples. The 'coin' package (Hothorn, Hornick, van de Weil, and Zeileis) presents a variety of permutation and rank-based tests that would probably be more powerful than any multi-group variant of the KS test. The multi-group variant of the Wilcoxon Rank Sum Test presented in the examples for the help page: ?wilcox_test is the Nemenyi-Damico-Wolfe-Dunn test. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Result differences in 32-bit vs. 64-bit point.in.polygon?
Is anyone aware of point.in.polygon giving different results for 32-bit vs. 64-bit R? Our OS is 64-bit Windows 7 Enterprise. I'm working with someone else's extensive R program and the final results are close but not exactly matching. We're thinking it might be something with the point.in.polygon function (one of many possibilities, including leaps). Thanks much, Shelly Lensing Biostatistics / University of Arkansas for Medical Sciences 4301 W. Markham St. #781 / Little Rock, AR 72205 V: 501.686.8203 / F: 501-526-6729 / COPH 3236 -- Confidentiality Notice: This e-mail message, including a...{{dropped:10}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about transforming a data.frame
I'm still not really clear on what you need (format, etc), but this may help you get started: with(df, table(CT, row_names)) row_names CT A1:A2:A3 B10:B11:B12 B4:B5:B6 B7:B8:B9 D10:D11:D12 D4:D5:D6 E10:E11:E12 20 001 21 1 41 100 00 0 50 010 00 0 with(df, table(CT, col_names)) col_names CT B1:B2:B3 D1:D2:D3 F10:F11:F12 G7:G8:G9 H1:H2:H3 H4:H5:H6 210 1111 411 0000 510 0000 On Fri, May 29, 2015 at 4:58 PM, Bogdan Tanasa tan...@gmail.com wrote: Hi Sarah, thank you for your help. I have simplified the example, by reading the elements in a data frame, eg : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) I have used the the count() in the plyr package : count_row_names - count(df$row_names) count_col_names - count(df$col_names) however, I would need to correlate these UNIQUE ELEMENTS in the columns row_names or col_names with the numbers they associate in the CT columns, eg : B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12 associate with 2 (in the CT column). thank you very much, bogdan On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, Please use dput() to provide your data, as it can get somewhat mangled by copy and pasting, especially if you post in HTML (as you are asked not to do in the posting guide). What is a unique element? is B4:B5:B6 an element, or are B4 and B5 each elements? That is, what is the result you expect to obtain for the sample data you provided? What code have you tried? I would think table() might be involved, and possibly strsplit(), but will refrain from putting more time into this until you provide a reproducible dataset with dput() and some clearer idea of your intent. Sarah On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with comparing multiple data sets
Hi Mohammad, I have no idea what is happening but for some reason your new data (renamed df1 since df is a reserved word in R) is outputting a list whereas dff1 (your original test data) is giving a vector as you wanted. It may be obvious but I don't see why df1 is giving us a list. As far as I can tell the two data sets are structually the same. The two data sets are below the program. ## = library(modeest) # Original test data str(dff2) head(dff2) # sample of new data str(d1) head(df1) Out.dff2 - apply(dff2[ ,2:length(dff2)], 1, mfv) str(Out.dff2) Out.df1 - apply(df1[ , 2:length(df1)], 1, mfv) str(Out.df1) ## = ## New data set df1 - structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access control, #privacy,personal data, #security,malicious,security, data controller, id management,security, password,recovery), class = factor), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -50L)) ## Original test data set dff2 - structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -49L)) ##= John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Fri, 29 May 2015 11:40:41 -0500 To: dcarl...@tamu.edu, drjimle...@gmail.com, jrkrid...@inbox.com, r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Hi everyone. I tried the (modeest) package on my initial test data and it worked. However, it doesn't work on the entire data set. I saved one of the protions that gives error. (Not for all of the values but for some of them). For example: lines 36 and 37 and 39 correctly show the mode value but 38 and 40 are not correct. Such error is repeated for many of the values. [36,] 2 [37,] 2 [38,] Numeric,3 [39,] 1 [40,] Numeric,3 #This is what I did: df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,) Out- apply(df[,2:length(df)],1, mfv) t(t(Out)) #This is the data set structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access control, #privacy,personal data, #security,malicious,security, data controller, id management,security, password,recovery), class = factor), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L,
Re: [R] about transforming a data.frame
thanks a lot Sarah, very much appreciate it ! On Fri, May 29, 2015 at 3:18 PM, Sarah Goslee sarah.gos...@gmail.com wrote: LMGTFY: http://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r On Fri, May 29, 2015 at 5:58 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear Sarah, thank you very much, it is very helpful. please may I ask one more question about a quick and easy tutorial about the loading multiple files (from a folder) in R, and processing one file at a time ? thanks very much again, bogdan On Fri, May 29, 2015 at 2:55 PM, Sarah Goslee sarah.gos...@gmail.com wrote: I'm still not really clear on what you need (format, etc), but this may help you get started: with(df, table(CT, row_names)) row_names CT A1:A2:A3 B10:B11:B12 B4:B5:B6 B7:B8:B9 D10:D11:D12 D4:D5:D6 E10:E11:E12 20 001 21 1 41 100 00 0 50 010 00 0 with(df, table(CT, col_names)) col_names CT B1:B2:B3 D1:D2:D3 F10:F11:F12 G7:G8:G9 H1:H2:H3 H4:H5:H6 210 1111 411 0000 510 0000 On Fri, May 29, 2015 at 4:58 PM, Bogdan Tanasa tan...@gmail.com wrote: Hi Sarah, thank you for your help. I have simplified the example, by reading the elements in a data frame, eg : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) I have used the the count() in the plyr package : count_row_names - count(df$row_names) count_col_names - count(df$col_names) however, I would need to correlate these UNIQUE ELEMENTS in the columns row_names or col_names with the numbers they associate in the CT columns, eg : B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12 associate with 2 (in the CT column). thank you very much, bogdan On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, Please use dput() to provide your data, as it can get somewhat mangled by copy and pasting, especially if you post in HTML (as you are asked not to do in the posting guide). What is a unique element? is B4:B5:B6 an element, or are B4 and B5 each elements? That is, what is the result you expect to obtain for the sample data you provided? What code have you tried? I would think table() might be involved, and possibly strsplit(), but will refrain from putting more time into this until you provide a reproducible dataset with dput() and some clearer idea of your intent. Sarah On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about transforming a data.frame
Hi Bogdan, If you mean How can I verify that B1:B2:B3 is paired with all of the values 2, 4 and 5 apply(table(df$col_names,df$CT),1,all) and if you mean How can I verify that B1:B2:B3 is paired with at least one of the values 2, 4 and 5 apply(table(df$col_names,df$CT),1,any) Jim Hi Jim, yes, thank you, that is the desired output. one more question please : after using the dataframe : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) and : table(df$row_names,df$CT) table(df$col_names,df$CT) how could I quickly verify that B1:B2:B3 (for example) hits the CT values of 2,4,5 at least one time ? an example is in table(df$col_names,df$CT) ? thank you very much, -- bogdan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about transforming a data.frame
Dear Sarah, thank you very much, it is very helpful. please may I ask one more question about a quick and easy tutorial about the loading multiple files (from a folder) in R, and processing one file at a time ? thanks very much again, bogdan On Fri, May 29, 2015 at 2:55 PM, Sarah Goslee sarah.gos...@gmail.com wrote: I'm still not really clear on what you need (format, etc), but this may help you get started: with(df, table(CT, row_names)) row_names CT A1:A2:A3 B10:B11:B12 B4:B5:B6 B7:B8:B9 D10:D11:D12 D4:D5:D6 E10:E11:E12 20 001 21 1 41 100 00 0 50 010 00 0 with(df, table(CT, col_names)) col_names CT B1:B2:B3 D1:D2:D3 F10:F11:F12 G7:G8:G9 H1:H2:H3 H4:H5:H6 210 1111 411 0000 510 0000 On Fri, May 29, 2015 at 4:58 PM, Bogdan Tanasa tan...@gmail.com wrote: Hi Sarah, thank you for your help. I have simplified the example, by reading the elements in a data frame, eg : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) I have used the the count() in the plyr package : count_row_names - count(df$row_names) count_col_names - count(df$col_names) however, I would need to correlate these UNIQUE ELEMENTS in the columns row_names or col_names with the numbers they associate in the CT columns, eg : B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12 associate with 2 (in the CT column). thank you very much, bogdan On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, Please use dput() to provide your data, as it can get somewhat mangled by copy and pasting, especially if you post in HTML (as you are asked not to do in the posting guide). What is a unique element? is B4:B5:B6 an element, or are B4 and B5 each elements? That is, what is the result you expect to obtain for the sample data you provided? What code have you tried? I would think table() might be involved, and possibly strsplit(), but will refrain from putting more time into this until you provide a reproducible dataset with dput() and some clearer idea of your intent. Sarah On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why I am not able to load library(R.matlab)? Other packages are fine.
Hi Ben, Thanks for the fun clip. I love it. Have a wonderful day! -M On Fri, May 29, 2015 at 5:10 PM, Ben Bolker bbol...@gmail.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I think Henrik's point (which I merely clarified) was that something funky (we'll probably never know what, and it's not worth figuring out unless it happens again/to other people) had gone wrong and that the easiest thing to do was just to reinstall. References: * https://www.youtube.com/watch?v=t2F1rFmyQmY * http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.208.9970rep=rep1type=pdf On 15-05-29 05:11 PM, C W wrote: Wow, thanks Ben. That worked very well. I guess I didn't have R.methodS3? But that doesn't make sense, because I was using R.matlab few weeks ago. I believe I was on R 3.1. Maybe it's in R 3.1 folder? I am using a Mac, btw. Cheers, -M On Fri, May 29, 2015 at 1:55 PM, Ben Bolker bbol...@gmail.com wrote: C W tmrsg11 at gmail.com writes: Hi Henrik, I don't quite get what I should do here. I am not familiar with R.methodS3. Can you tell me what command exactly do I need to do? Thanks, Mike install.packages(R.methodsS3) install.packages(R.matlab) library(R.matlab) [snip snip snip] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJVaNXMAAoJEOCV5YRblxUHj6kH/3W3etyn+HlT0X1PEj7DQf2c Qo0q9ed2csPRLbLLrpX2FPKbxLg/g6MSxmIQ118tbWhkzKfRoyxCZHLcT+U2xLuR V7QAS3Yns2ENSSSH1GvdSeFZTQWW3XFZN/kT+/zQYjaZewZOlo4Cgqc16c6mGBRS eSIRIyA3iJWnMEc878nbMJztvsEqnpZSNSIXiI91UX/l8sDrBNYCNtfzY86JqJhp 8O0q7zkaRIrb6UuViY3qTC5+qpGruUYIUbeqyNei7MNErrG3AufsODfs5d/CjSCa 5jlbS512JRrQFV2JKHU+AH+4Q9CJQBVS+F6JZdjhHB2fUmAx0XIR6IJEBfSvBSk= =nO+b -END PGP SIGNATURE- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Result differences in 32-bit vs. 64-bit point.in.polygon?
On 29/05/2015 2:36 PM, Lensing, Shelly Y wrote: Is anyone aware of point.in.polygon giving different results for 32-bit vs. 64-bit R? Our OS is 64-bit Windows 7 Enterprise. I'm working with someone else's extensive R program and the final results are close but not exactly matching. We're thinking it might be something with the point.in.polygon function (one of many possibilities, including leaps). Often 32 bit R does calculations slightly more accurately than 64 bit R does. This is because the 64 bit compiler is more likely to do calculations in 64 bit precision when the 32 bit compiler does them in 80 bit precision. Of course, individual calculations being more accurate doesn't mean the final answer is, but small numeric differences in floating point calculations are to be expected. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about transforming a data.frame
Hi Jim, thanks again. now I see : the answer to my previous question seems to be yes, as all functions works on logical vectors ... best wishes, -- bogdan On Fri, May 29, 2015 at 4:29 PM, Bogdan Tanasa tan...@gmail.com wrote: Thanks a lot Jim. If I may ask one more little question please, shall I ask the question How can I verify that B1:B2:B3 is paired with ALL of the values 2, 4 and 5 , regardless of the pairing value (in our case, for the code below, the pairing value for B1:B2:B3 is 1, but it can be 2,3,4, etc BUT NOT zero), how could I test for that ? or this is the way that apply works for all argument ? a good documentation for apply function will help too . thanks, and happy weekend ! -- bogdan On Fri, May 29, 2015 at 4:21 PM, Jim Lemon drjimle...@gmail.com wrote: Hi Bogdan, If you mean How can I verify that B1:B2:B3 is paired with all of the values 2, 4 and 5 apply(table(df$col_names,df$CT),1,all) and if you mean How can I verify that B1:B2:B3 is paired with at least one of the values 2, 4 and 5 apply(table(df$col_names,df$CT),1,any) Jim Hi Jim, yes, thank you, that is the desired output. one more question please : after using the dataframe : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) and : table(df$row_names,df$CT) table(df$col_names,df$CT) how could I quickly verify that B1:B2:B3 (for example) hits the CT values of 2,4,5 at least one time ? an example is in table(df$col_names,df$CT) ? thank you very much, -- bogdan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] Mi script R es muy lento
Hola Mª Luz, ¿A qué tipo de cálculos complejos te refieres?. Con data.table puedes definir operaciones (con la complejidad que quieras) para un conjunto de filas, agrupándolas por columnas y más... Su sintaxis es muy compacta pero a poco que la utilizas acabas encontrando la forma de hacer las cosas sin muchos pasos intermedios. Pero puedes hacerlo menos compacto y quizás más comprensible. Y sobre la eficiencia de data.table comparándolo con otras alternativas aquí viene una comparativa: http://stackoverflow.com/questions/4322219/whats-the-fastest-way-to-merge-join-data-frames-in-r Aunque desde la aparición de dplyr, la duda aparece sobre si es más conveniente data.table o dplyr. Aquí hay otro hilo que los compara, teniendo en cuenta diferentes atributos: http://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly/27718317#27718317 ¿Qué volúmen de datos quieres procesar? Y...¿quieres algo más rápido que menos de un segundo?... Saludos, Carlos Ortega www.qualityexcellence.es El 29 de mayo de 2015, 15:50, MªLuz Morales mlzm...@gmail.com escribió: Hola, quiero compartir con vosotros mi problema y la solución que me han planteado. Mi programa carga Outcomes.csv y Set-A.csv (descargados de http://garrickadenbuie.com/blog/2013/04/11/visualize-physionet-data-with-r/ , apartado Getting Started -- the code and the data set) de unos 50MB entre los dos. Mi código era: # Transforma csv a data frame seta - read.csv('Set-A.csv'); outcomes - read.csv('Outcomes-A.csv'); ids - as.character(unique(outcomes$RecordID)); ## Número de RecordsID distintos Length_ids - length(ids); #número de RecordsID distintos ListaABP - list('RecordID'=-1,'SAPS.I'=-1, 'SOFA'=-1, 'Survival'=-1, 'In.hospital_death'=-1, 'NISysABP_Min'=-1,'NISysABP_Max'=-1, 'NISysABP_Mean'=-1, 'NIDiasABP_Min'=-1,'NIDiasABP_Max'=-1, 'NIDiasABP_Mean'=-1,'NIMAP_Min'=-1,'NIMAP_Max'=-1, 'NIMAP_Mean'=-1); for (i in 1:Length_ids){#NumRecordID){ # Para cada paciente... ListaABP$RecordID[i] - outcomes$RecordID[i]; ListaABP$SAPS.I[i] - outcomes$SAPS.I[i]; ListaABP$SOFA[i] - outcomes$SOFA[i]; ListaABP$Survival[i] - outcomes$Survival[i]; ListaABP$In.hospital_death[i] - outcomes$In.hospital_death[i]; # Parameter == 'NISysBP' #seta_NISysABP - seta[seta$RecordID == ids[i] seta$Parameter == 'NISysABP' , c('RecordID','Value')] ; seta_NISysABP - seta[seta$RecordID == ids[i] seta$Parameter == 'NISysABP' , 'Value'] ; #Creo que esto ya no sería un dataframe, por lo que en la siguiente línea puede dar error ListaABP$NISysABP_Min[i] - min(seta_NISysABP); ListaABP$NISysABP_Max[i] - max(seta_NISysABP); ListaABP$NISysABP_Mean[i] - mean(seta_NISysABP); # Parameter == 'NIDiasABP' #seta_NIDiasABP - seta[seta$RecordID == ids[i] seta$Parameter == 'NIDiasABP' , c('Time','Value')] ; #En este caso la forma de hacer el min sería ...min(seta_NIDiasABP$Value); seta_NIDiasABP - seta[seta$RecordID == ids[i] seta$Parameter == 'NIDiasABP' , 'Value'] ; ListaABP$NIDiasABP_Min[i] - min(seta_NIDiasABP); ListaABP$NIDiasABP_Max[i] - max(seta_NIDiasABP); ListaABP$NIDiasABP_Mean[i] - mean(seta_NIDiasABP); # Parameter == 'NIMAP' #seta_NIMAP - seta[seta$RecordID == ids[i] seta$Parameter == 'NIMAP' , c('Time','Value')] ; seta_NIMAP - seta[seta$RecordID == ids[i] seta$Parameter == 'NIMAP' , 'Value'] ; ListaABP$NIMAP_Min[i] - min(seta_NIMAP); ListaABP$NIMAP_Max[i] - max(seta_NIMAP); ListaABP$NIMAP_Mean[i] - mean(seta_NIMAP); }#for i Tabla - data.frame(ListaABP); #+ Este código tardaba 3 horas en ejecutarse. La solución que me han propuesto es usar data.table en lugar de data.frame y ahora tarda 1 segundo aproximadamente en ejecutarse y es este: #- library(data.table) datSet - fread(Set-A.csv) resOut - datSet[, .(ValMax=max(Value), ValMin=min(Value), ValAvg=mean(Value)), by=c(RecordID,Parameter)] resOut$RecordID - as.factor(resOut$RecordID) setkey(resOut, RecordID) head(datSet) datOutcome - fread(Outcomes-A.csv) datOutcome$RecordID - as.factor(datOutcome$RecordID) setkey(datOutcome, RecordID) head(datOutcome) #resEnd - merge(resOut, datOutcome, by=RecordID, all=TRUE, allow.cartesian=FALSE) resEnd - resOut[datOutcome] head(resEnd) setkey(resEnd, Parameter) #Ejemplo para conseguir uno o varios parametros. myRes - resEnd[c(NISysABP,NIDiasABP,NIMAP)] head(myRes) #-- Tengo una pregunta, data.table es lo más eficiente para procesar grandes cantidades de datos?, es fácil de manejar si quieres realizar cálculos complejos además de reorganizar tablas...?? Gracias Un saludo [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es -- Saludos, Carlos Ortega www.qualityexcellence.es
Re: [R] about transforming a data.frame
Hi Jim, yes, thank you, that is the desired output. one more question please : after using the dataframe : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) and : table(df$row_names,df$CT) table(df$col_names,df$CT) how could I quickly verify that B1:B2:B3 (for example) hits the CT values of 2,4,5 at least one time ? an example is in table(df$col_names,df$CT) ? thank you very much, -- bogdan On Fri, May 29, 2015 at 2:40 PM, Jim Lemon drjimle...@gmail.com wrote: Hi Bogdan, Sarah has already suggested this, but doesn't: table(df$row_names,df$CT) table(df$col_names,df$CT) give you what you want? Jim On Sat, May 30, 2015 at 7:11 AM, John Kane jrkrid...@inbox.com wrote: Bogdan, the request was for data in dput() format. Type ?dput for more information. Do dput(myfile) copy the ouput and paste into the email You should get something like: structure(list(c1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c((0.509,0.614], (0.614,0.718], (0.718,0.822], (0.822,0.926], (0.926,1.03], (1.03,1.13], (1.13,1.24], (1.24,1.34], (1.34,1.45], (1.45,1.55] ), class = factor), s1 = c(0.51, 0.52, 0.58, 0.58, 0.59, 0.6, 0.63, 0.65, 0.68, 0.74, 0.74, 0.75, 0.77, 0.77, 0.77, 0.78, 0.79, 0.84, 0.84, 0.85, 0.87, 0.93, 0.93, 0.95, 0.99, 1.04, 1.09, 1.11, 1.13, 1.14, 1.14, 1.14, 1.17, 1.18, 1.19, 1.22, 1.22, 1.23, 1.28, 1.29, 1.3, 1.32, 1.37, 1.38, 1.38, 1.4, 1.43, 1.47, 1.52, 1.55 )), .Names = c(c1, s1), row.names = c(NA, -50L), class = data.frame) Data in duput() format is the preferred way to get data in R-help since it provides a perfect copy of what you have on your machine. Any other way of providing data risks the recipients reading it into R differently than it is on your machine. John Kane Kingston ON Canada -Original Message- From: tan...@gmail.com Sent: Fri, 29 May 2015 13:58:20 -0700 To: sarah.gos...@gmail.com Subject: Re: [R] about transforming a data.frame Hi Sarah, thank you for your help. I have simplified the example, by reading the elements in a data frame, eg : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) I have used the the count() in the plyr package : count_row_names - count(df$row_names) count_col_names - count(df$col_names) however, I would need to correlate these UNIQUE ELEMENTS in the columns row_names or col_names with the numbers they associate in the CT columns, eg : B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12 associate with 2 (in the CT column). thank you very much, bogdan On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, Please use dput() to provide your data, as it can get somewhat mangled by copy and pasting, especially if you post in HTML (as you are asked not to do in the posting guide). What is a unique element? is B4:B5:B6 an element, or are B4 and B5 each elements? That is, what is the result you expect to obtain for the sample data you provided? What code have you tried? I would think table() might be involved, and possibly strsplit(), but will refrain from putting more time into this until you provide a reproducible dataset with dput() and some clearer idea of your intent. Sarah On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
Re: [R] about transforming a data.frame
LMGTFY: http://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r On Fri, May 29, 2015 at 5:58 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear Sarah, thank you very much, it is very helpful. please may I ask one more question about a quick and easy tutorial about the loading multiple files (from a folder) in R, and processing one file at a time ? thanks very much again, bogdan On Fri, May 29, 2015 at 2:55 PM, Sarah Goslee sarah.gos...@gmail.com wrote: I'm still not really clear on what you need (format, etc), but this may help you get started: with(df, table(CT, row_names)) row_names CT A1:A2:A3 B10:B11:B12 B4:B5:B6 B7:B8:B9 D10:D11:D12 D4:D5:D6 E10:E11:E12 20 001 21 1 41 100 00 0 50 010 00 0 with(df, table(CT, col_names)) col_names CT B1:B2:B3 D1:D2:D3 F10:F11:F12 G7:G8:G9 H1:H2:H3 H4:H5:H6 210 1111 411 0000 510 0000 On Fri, May 29, 2015 at 4:58 PM, Bogdan Tanasa tan...@gmail.com wrote: Hi Sarah, thank you for your help. I have simplified the example, by reading the elements in a data frame, eg : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) I have used the the count() in the plyr package : count_row_names - count(df$row_names) count_col_names - count(df$col_names) however, I would need to correlate these UNIQUE ELEMENTS in the columns row_names or col_names with the numbers they associate in the CT columns, eg : B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12 associate with 2 (in the CT column). thank you very much, bogdan On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, Please use dput() to provide your data, as it can get somewhat mangled by copy and pasting, especially if you post in HTML (as you are asked not to do in the posting guide). What is a unique element? is B4:B5:B6 an element, or are B4 and B5 each elements? That is, what is the result you expect to obtain for the sample data you provided? What code have you tried? I would think table() might be involved, and possibly strsplit(), but will refrain from putting more time into this until you provide a reproducible dataset with dput() and some clearer idea of your intent. Sarah On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] vectorized code
HI I was working on online example, where virus is spread through a graph. The example is sufficient for small graph i.e. small number of edges and nodes. But I tried it on very large graph i.e. 1 nodes and 2 edges, but the below function is not sufficient for large graph because its slow. My question is how can the below function be converted to Vectorized code can be optimized for large graphs? spreadVirus - function(G,Vinitial,Activation_probability){ # Precompute all outgoing graph adjacencies G$AdjList = get.adjlist(G,mode=out) # Initialize various graph attributes V(G)$color= blue E(G)$color= black V(G)[Vinitial]$color- yellow # List to store the incremental graphs (for plotting later) Glist - list(G) count - 1 # Spread the infection active - Vinitial while(length(active)0){ new_infected - NULL E(G)$color = black for(v in active){ # spread through the daily contacts of vertex v daily_contacts - G$AdjList[[v]] E(G)[v %-% daily_contacts]$color - red for(v1 in daily_contacts){ if(V(G)[v1]$color == blue new_color==red) { V(G)[v1]$color - red new_infected - c(new_infected,v1) } } } # the next active set #this needed for updating active - new_infected # Add graph to list # optional dependening on if i want to graph count - count + 1 Glist[[count]] - G } return(Glist) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on R Functionality Histogram
Don't use Nabble when posting to the R-Help forum. Responses inline. On May 29, 2015, at 7:54 AM, Shivi82 shivibha...@ymail.com wrote: Hello Experts, I have couple of questions on the analysis I am creating. 1) How does R adopt to changes. The case I have here is that the excel I have started initially had to be modified because the data I had was on hourly basis ranging from 0 to 23 hours. After Changes 0 was modified to 24 in hours. Now do I need to recall this excel again in R using read.csv syntax or is there another way to do so i.e. a kind of reload option No. Reload the data by rerunning your script. 2) I am creating a histogram. I need on x axis 24 hours to be displayed separately as 0,1,2, and thereon. However it only shows till 20 which makes the look awkward. Also all l need to resize the labels and if possible inside the bars. It used the below code, axis fonts have changed but labels give an error with this code Code:- hist(aaa$Hours,main=Hourly Weight,xlab = Time,breaks = 25,col = yellow,ylim = c(0,9000), labels=TRUE, cex.axis=0.6,cex.label=0.6) The very understandable warning message you must have got with that call tells you that there is no such argument cex.label. hist() calls plot.histogram() which internally calls text() to write the labels. text() has an argument cex, but even if you supply it to hist(), it is not passed to text() via the function body of plot.histogram(). You could modify plot.histogram but the more immediate solution is to set labels = FALSE, and explicitly use text() to write your labels. Try something like x - hist(aaa$Hours, main=Hourly Weight, xlab = Time, breaks = 25, col = yellow, ylim = c(0,9000), labels=FALSE, cex.axis=0.6) text(x$mids, x$counts * 1.05, labels = x$counts, cex=0.5) B. Kindly advice on the both the questions. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Help-on-R-Functionality-Histogram-tp4707887.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about transforming a data.frame
Hi Bogdan, Sarah has already suggested this, but doesn't: table(df$row_names,df$CT) table(df$col_names,df$CT) give you what you want? Jim On Sat, May 30, 2015 at 7:11 AM, John Kane jrkrid...@inbox.com wrote: Bogdan, the request was for data in dput() format. Type ?dput for more information. Do dput(myfile) copy the ouput and paste into the email You should get something like: structure(list(c1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c((0.509,0.614], (0.614,0.718], (0.718,0.822], (0.822,0.926], (0.926,1.03], (1.03,1.13], (1.13,1.24], (1.24,1.34], (1.34,1.45], (1.45,1.55] ), class = factor), s1 = c(0.51, 0.52, 0.58, 0.58, 0.59, 0.6, 0.63, 0.65, 0.68, 0.74, 0.74, 0.75, 0.77, 0.77, 0.77, 0.78, 0.79, 0.84, 0.84, 0.85, 0.87, 0.93, 0.93, 0.95, 0.99, 1.04, 1.09, 1.11, 1.13, 1.14, 1.14, 1.14, 1.17, 1.18, 1.19, 1.22, 1.22, 1.23, 1.28, 1.29, 1.3, 1.32, 1.37, 1.38, 1.38, 1.4, 1.43, 1.47, 1.52, 1.55 )), .Names = c(c1, s1), row.names = c(NA, -50L), class = data.frame) Data in duput() format is the preferred way to get data in R-help since it provides a perfect copy of what you have on your machine. Any other way of providing data risks the recipients reading it into R differently than it is on your machine. John Kane Kingston ON Canada -Original Message- From: tan...@gmail.com Sent: Fri, 29 May 2015 13:58:20 -0700 To: sarah.gos...@gmail.com Subject: Re: [R] about transforming a data.frame Hi Sarah, thank you for your help. I have simplified the example, by reading the elements in a data frame, eg : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) I have used the the count() in the plyr package : count_row_names - count(df$row_names) count_col_names - count(df$col_names) however, I would need to correlate these UNIQUE ELEMENTS in the columns row_names or col_names with the numbers they associate in the CT columns, eg : B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12 associate with 2 (in the CT column). thank you very much, bogdan On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, Please use dput() to provide your data, as it can get somewhat mangled by copy and pasting, especially if you post in HTML (as you are asked not to do in the posting guide). What is a unique element? is B4:B5:B6 an element, or are B4 and B5 each elements? That is, what is the result you expect to obtain for the sample data you provided? What code have you tried? I would think table() might be involved, and possibly strsplit(), but will refrain from putting more time into this until you provide a reproducible dataset with dput() and some clearer idea of your intent. Sarah On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
Re: [R] TWS and R
Has anyone found a solution to this? I am having the same issue? thanks! On Thursday, November 15, 2012 at 10:35:48 PM UTC-8, abcd1234 wrote: Hi all, The TWS on my system is unable to connect to my R session. Here is the error that I'm getting: / tws-twsConnect() Error in socketConnection(host = host, port = port, open = ab, blocking = blocking) : cannot open the connection In addition: Warning message: In socketConnection(host = host, port = port, open = ab, blocking = blocking) : localhost:7496 cannot be opened/ Here is the session info for the R session: / R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_IN.UTF-8LC_NUMERIC=C [3] LC_TIME=en_IN.UTF-8LC_COLLATE=en_IN.UTF-8 [5] LC_MONETARY=en_IN.UTF-8LC_MESSAGES=en_IN.UTF-8 [7] LC_PAPER=CLC_NAME=C [9] LC_ADDRESS=CLC_TELEPHONE=C [11] LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] statsgraphicsgrDevices utilsdatasets [6] methodsbase other attached packages: [1] IBrokers_0.9-10 xts_0.8-6zoo_1.7-8 loaded via a namespace (and not attached): [1] grid_2.15.1lattice_0.20-0 tools_2.15.1/ I have checked the Enable Activex and Socket clients but it hasn't helped. Since I'm running on an Ubuntu machine, I even tried changing the parameter blocking in the command twsConnect() to 1. blocking = FALSE 2. According to the one mentioned here http://code.google.com/p/ibrokers/source/detail?r=84path=/trunk/R/twsConnect.R but nothing has helped. I have also added 127.0.0.1 to the Trusted IP option. Please let me know what I should do. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/TWS-and-R-tp4649699.html Sent from the R help mailing list archive at Nabble.com. __ r-h...@r-project.org javascript: mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting unique strings to unique numbers
Hi Bill, On 05/29/2015 01:48 PM, William Dunlap wrote: I'm not sure why which particular ID gets assigned to each string would matter but maybe I'm missing something. What really matters is that each string receives a unique ID. match(x, x) does that. I think each row of the OP's dataset represented an individual (column 2) followed by its mother and father (columns 3 and 4). I assume that the marker 0 means that a parent is not in the dataset. If you match against the strings in column 2 only, in their original order, then the resulting numbers give the row number of an individual, Note that the code I gave happens to do exactly that (assuming that column 2 contains no duplicates, but your code is also relying on that assumption in order to have the ids match the row numbers). We're discussing the merit of match(x, x) versus match(x, unique(x)). All I'm trying to say is that the unique(x) step (which doubles the cost of the whole operation, because it also uses hashing, like match() does) is generally not needed. It doesn't seem to be needed in Kate's use case. H. making it straightforward to look up information regarding the ancestors of an individual. Hence the choice of numeric ID's may be important. Bill Dunlap TIBCO Software wdunlap tibco.com http://tibco.com On Fri, May 29, 2015 at 1:29 PM, Hervé Pagès hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote: Hi Sarah, On 05/29/2015 12:04 PM, Sarah Goslee wrote: On Fri, May 29, 2015 at 2:16 PM, Hervé Pagès hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote: Hi Kate, I found that matching the character vector to itself is a very effective way to do this: x - c(a, bunch, of, strings, whose, exact, content, is, of, little, interest) ids - match(x, x) ids # [1] 1 2 3 4 5 6 7 8 3 10 11 By using this trick, many manipulations on character vectors can be replaced by manipulations on integer vectors, which are sometimes way more efficient. Hm. I hadn't thought of that approach - I use the as.numeric(factor(...)) approach. So I was curious, and compared the two: set.seed(43) x - sample(letters, 1, replace=TRUE) system.time({ for(i in seq_len(2)) { ids1 - match(x, x) }}) # user system elapsed # 9.657 0.000 9.657 system.time({ for(i in seq_len(2)) { ids2 - as.numeric(factor(x, levels=letters)) }}) # user system elapsed # 6.160.006.16 Using factor() is faster. That's an unfair comparison, because you already know what the levels are so you can supply them to your call to factor(). Most of the time you don't know what the levels are so either you just do factor(x) and let the factor() constructor compute the levels for you, or you compute them yourself upfront with something like factor(x, levels=unique(x)). library(microbenchmark) microbenchmark( {ids1 - match(x, x)}, {ids2 - as.integer(factor(x, levels=letters))}, {ids3 - as.integer(factor(x))}, {ids4 - as.integer(factor(x, levels=unique(x)))} ) Unit: microseconds expr min lq { ids1 - match(x, x) } 245.979 262.2390 { ids2 - as.integer(factor(x, levels = letters)) } 214.115 219.2320 { ids3 - as.integer(factor(x)) } 380.782 388.7295 { ids4 - as.integer(factor(x, levels = unique(x))) } 332.250 342.6630 mean median uq max neval 267.3210 264.4845 268.348 293.894 100 226.9913 220.9870 226.147 314.875 100 402.2242 394.7165 412.075 481.410 100 349.7405 345.3090 353.162 383.002 100 More importantly, using factor() lets you set the order of the indices in an expected fashion, where match() assigns them in the order of occurrence. head(data.frame(x, ids1, ids2)) x ids1 ids2 1 m1 13 2 x2 24 3 b32 4 s4 19 5 i59 6 o6 15 In a problem like Kate's where there are several columns for which the same ordering of indices is desired, that becomes really important. I'm not sure why which particular ID gets assigned to each string would matter but maybe I'm missing something. What really matters is that each string receives a unique ID. match(x, x) does that. In Kate's problem, where the strings are in more than one column, and you want the ID to be unique across the columns, you need to do
Re: [R] about transforming a data.frame
Thanks a lot Jim. If I may ask one more little question please, shall I ask the question How can I verify that B1:B2:B3 is paired with ALL of the values 2, 4 and 5 , regardless of the pairing value (in our case, for the code below, the pairing value for B1:B2:B3 is 1, but it can be 2,3,4, etc BUT NOT zero), how could I test for that ? or this is the way that apply works for all argument ? a good documentation for apply function will help too . thanks, and happy weekend ! -- bogdan On Fri, May 29, 2015 at 4:21 PM, Jim Lemon drjimle...@gmail.com wrote: Hi Bogdan, If you mean How can I verify that B1:B2:B3 is paired with all of the values 2, 4 and 5 apply(table(df$col_names,df$CT),1,all) and if you mean How can I verify that B1:B2:B3 is paired with at least one of the values 2, 4 and 5 apply(table(df$col_names,df$CT),1,any) Jim Hi Jim, yes, thank you, that is the desired output. one more question please : after using the dataframe : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) and : table(df$row_names,df$CT) table(df$col_names,df$CT) how could I quickly verify that B1:B2:B3 (for example) hits the CT values of 2,4,5 at least one time ? an example is in table(df$col_names,df$CT) ? thank you very much, -- bogdan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with comparing multiple data sets
Hi Mohammad, It looks like you are still having problems with this. Given your latest data set, as below, here is something that might do what you want. From David's message, I'm not sure whether you are operating on a single data frame or a list. # this is the data set as taken from your message below madf-structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access control, #privacy,personal data, #security,malicious,security, data controller, id management,security, password,recovery), class = factor), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -50L)) # define a function that extracts the value from one field # selected by a value in another field extract_by_value-function(x,field1,value1,field2) { return(x[x[,field1]==value1,field2]) } # define another function that equates all of the values sub_value-function(x,field1,value1,field2,value2) { x[x[,field1]==value1,field2]-value2 return(x) } # this now steps through every value in key_field # and operates on every field listed in change_fields conformity-function(x,key_field,change_fields) { keys-unique(x[,key_field]) for(key in keys) { for(change_field in change_fields) { # get the most frequent value in change_field # for the desired value in key_field most_freq-as.numeric(names(which.max(table( extract_by_value(x,key_field,key,change_field) # now set all the values to the most frequent x-sub_value(x,key_field,key,change_field,most_freq) } } return(x) } conformity(madf,terms,c(class.1,class.2,class.3)) Obviously you will want to save the return value of conformity into your original data frame or create a new one. Jim Hi everyone. I tried the (modeest) package on my initial test data and it worked. However, it doesn't work on the entire data set. I saved one of the protions that gives error. (Not for all of the values but for some of them). For example: lines 36 and 37 and 39 correctly show the mode value but 38 and 40 are not correct. Such error is repeated for many of the values. [36,] 2 [37,] 2 [38,] Numeric,3 [39,] 1 [40,] Numeric,3 #This is what I did: df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,) Out- apply(df[,2:length(df)],1, mfv) t(t(Out)) #This is the data set structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access control, #privacy,personal data, #security,malicious,security, data controller, id management,security, password,recovery), class = factor), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -50L)) also when I try to include the terms to the result it gives me an error: mode.names- data.frame (df[,1],Out) Error in data.frame(df[, 1], Out) : arguments imply differing number of rows: 50, 3 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
Re: [R] Toronto CRAN mirror 403 error?
On May 29, 2015, at 7:12 PM, Mark Drummond wrote: I've been getting a 403 when I try pulling from the Toronto CRAN mirror today. http://cran.utstat.utoronto.ca/ Right. It's been out for the last 2.7 days: http://cran.r-project.org/mirmon_report.html#ca Is there a contact list for mirror managers? Why do you care? Why not use another mirror? The http://lib.stat.cmu.edu/R/CRAN/ mirror should be fairly close if you are on that side of the continent. -- David. -- Cheers, Mark *Mark Drummond* m...@markdrummond.ca When I get sad, I stop being sad and be Awesome instead. TRUE STORY. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Toronto CRAN mirror 403 error?
I've been getting a 403 when I try pulling from the Toronto CRAN mirror today. http://cran.utstat.utoronto.ca/ Is there a contact list for mirror managers? -- Cheers, Mark *Mark Drummond* m...@markdrummond.ca When I get sad, I stop being sad and be Awesome instead. TRUE STORY. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Toronto CRAN mirror 403 error?
This is why there are mirrors. You don't have to wait for them or tell them to do their jobs. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On May 29, 2015 7:12:56 PM PDT, Mark Drummond m...@markdrummond.ca wrote: I've been getting a 403 when I try pulling from the Toronto CRAN mirror today. http://cran.utstat.utoronto.ca/ Is there a contact list for mirror managers? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Toronto CRAN mirror 403 error?
On Fri, May 29, 2015 at 10:12 PM, Mark Drummond m...@markdrummond.ca wrote: I've been getting a 403 when I try pulling from the Toronto CRAN mirror today. http://cran.utstat.utoronto.ca/ Is there a contact list for mirror managers? See the cran_mirrors.csv file in R.home(doc) of your R distribution. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on R Functionality Histogram
Thanks you Sarah. This was very impressive and really helped me out. -- View this message in context: http://r.789695.n4.nabble.com/Help-on-R-Functionality-Histogram-tp4707886p4707949.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Converting unique strings to unique numbers
I have a pedigree file as so: X0001 BYX859 0 0 2 1 BYX859 X0001 BYX894 0 0 1 1 BYX894 X0001 BYX862 BYX894 BYX859 2 2 BYX862 X0001 BYX863 BYX894 BYX859 2 2 BYX863 X0001 BYX864 BYX894 BYX859 2 2 BYX864 X0001 BYX865 BYX894 BYX859 2 2 BYX865 And I was hoping to change all unique string values to numbers. That is: BYX859 = 1 BYX894 = 2 BYX862 = 3 BYX863 = 4 BYX864 = 5 BYX865 = 6 But only in columns 2 - 4. Essentially I would like the data to look like this: X0001 1 0 0 2 1 BYX859 X0001 2 0 0 1 1 BYX894 X0001 3 2 1 2 2 BYX862 X0001 4 2 1 2 2 BYX863 X0001 5 2 1 2 2 BYX864 X0001 6 2 1 2 2 BYX865 Is this possible with factors? Thanks! K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting unique strings to unique numbers
Here is an example to get you started: mycol - c('b','a','d','d','b','c') as.numeric(factor(mycol)) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 5/29/15, 9:58 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I have a pedigree file as so: X0001 BYX859 0 0 2 1 BYX859 X0001 BYX894 0 0 1 1 BYX894 X0001 BYX862 BYX894 BYX859 2 2 BYX862 X0001 BYX863 BYX894 BYX859 2 2 BYX863 X0001 BYX864 BYX894 BYX859 2 2 BYX864 X0001 BYX865 BYX894 BYX859 2 2 BYX865 And I was hoping to change all unique string values to numbers. That is: BYX859 = 1 BYX894 = 2 BYX862 = 3 BYX863 = 4 BYX864 = 5 BYX865 = 6 But only in columns 2 - 4. Essentially I would like the data to look like this: X0001 1 0 0 2 1 BYX859 X0001 2 0 0 1 1 BYX894 X0001 3 2 1 2 2 BYX862 X0001 4 2 1 2 2 BYX863 X0001 5 2 1 2 2 BYX864 X0001 6 2 1 2 2 BYX865 Is this possible with factors? Thanks! K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting unique strings to unique numbers
On Fri, May 29, 2015 at 2:16 PM, Hervé Pagès hpa...@fredhutch.org wrote: Hi Kate, I found that matching the character vector to itself is a very effective way to do this: x - c(a, bunch, of, strings, whose, exact, content, is, of, little, interest) ids - match(x, x) ids # [1] 1 2 3 4 5 6 7 8 3 10 11 By using this trick, many manipulations on character vectors can be replaced by manipulations on integer vectors, which are sometimes way more efficient. Hm. I hadn't thought of that approach - I use the as.numeric(factor(...)) approach. So I was curious, and compared the two: set.seed(43) x - sample(letters, 1, replace=TRUE) system.time({ for(i in seq_len(2)) { ids1 - match(x, x) }}) # user system elapsed # 9.657 0.000 9.657 system.time({ for(i in seq_len(2)) { ids2 - as.numeric(factor(x, levels=letters)) }}) # user system elapsed # 6.160.006.16 Using factor() is faster. More importantly, using factor() lets you set the order of the indices in an expected fashion, where match() assigns them in the order of occurrence. head(data.frame(x, ids1, ids2)) x ids1 ids2 1 m1 13 2 x2 24 3 b32 4 s4 19 5 i59 6 o6 15 In a problem like Kate's where there are several columns for which the same ordering of indices is desired, that becomes really important. If you take Bill Dunlap's modification of the match() approach, it resolves both problems: matching against the pooled unique values is both faster than the factor() version and gives the same result: On Fri, May 29, 2015 at 1:31 PM, William Dunlap wdun...@tibco.com wrote: match() will do what you want. E.g., run your data through the following function. f - function (data) { uniqStrings - unique(c(data[, 2], data[, 3], data[, 4])) uniqStrings - setdiff(uniqStrings, 0) for (j in 2:4) { data[[j]] - match(data[[j]], uniqStrings, nomatch = 0L) } data } ## y - data.frame(id = 1:5000, v1 = sample(letters, 5000, replace=TRUE), v2 = sample(letters, 5000, replace=TRUE), v3 = sample(letters, 5000, replace=TRUE), stringsAsFactors=FALSE) system.time({ for(i in seq_len(2)) { ids3 - f(data.frame(y)) }}) # user system elapsed # 22.515 0.000 22.518 ff - function(data) { uniqStrings - unique(c(data[, 2], data[, 3], data[, 4])) uniqStrings - setdiff(uniqStrings, 0) for (j in 2:4) { data[[j]] - as.numeric(factor(data[[j]], levels=uniqStrings)) } data } system.time({ for(i in seq_len(2)) { ids4 - ff(data.frame(y)) }}) #user system elapsed # 26.083 0.002 26.090 head(ids3) id v1 v2 v3 1 1 1 2 8 2 2 2 19 22 3 3 3 21 16 4 4 4 10 17 5 5 1 8 18 6 6 1 12 26 head(ids4) id v1 v2 v3 1 1 1 2 8 2 2 2 19 22 3 3 3 21 16 4 4 4 10 17 5 5 1 8 18 6 6 1 12 26 Kate, if you're getting all zeros, check str(yourdataframe) - it's likely that when you imported your data into R the strings were already converted to factors, which is not what you want (ask me how I know this!). Sarah On 05/29/2015 09:58 AM, Kate Ignatius wrote: I have a pedigree file as so: X0001 BYX859 0 0 2 1 BYX859 X0001 BYX894 0 0 1 1 BYX894 X0001 BYX862 BYX894 BYX859 2 2 BYX862 X0001 BYX863 BYX894 BYX859 2 2 BYX863 X0001 BYX864 BYX894 BYX859 2 2 BYX864 X0001 BYX865 BYX894 BYX859 2 2 BYX865 And I was hoping to change all unique string values to numbers. That is: BYX859 = 1 BYX894 = 2 BYX862 = 3 BYX863 = 4 BYX864 = 5 BYX865 = 6 But only in columns 2 - 4. Essentially I would like the data to look like this: X0001 1 0 0 2 1 BYX859 X0001 2 0 0 1 1 BYX894 X0001 3 2 1 2 2 BYX862 X0001 4 2 1 2 2 BYX863 X0001 5 2 1 2 2 BYX864 X0001 6 2 1 2 2 BYX865 Is this possible with factors? Thanks! K. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting unique strings to unique numbers
I'm not sure why which particular ID gets assigned to each string would matter but maybe I'm missing something. What really matters is that each string receives a unique ID. match(x, x) does that. I think each row of the OP's dataset represented an individual (column 2) followed by its mother and father (columns 3 and 4). I assume that the marker 0 means that a parent is not in the dataset. If you match against the strings in column 2 only, in their original order, then the resulting numbers give the row number of an individual, making it straightforward to look up information regarding the ancestors of an individual. Hence the choice of numeric ID's may be important. Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, May 29, 2015 at 1:29 PM, Hervé Pagès hpa...@fredhutch.org wrote: Hi Sarah, On 05/29/2015 12:04 PM, Sarah Goslee wrote: On Fri, May 29, 2015 at 2:16 PM, Hervé Pagès hpa...@fredhutch.org wrote: Hi Kate, I found that matching the character vector to itself is a very effective way to do this: x - c(a, bunch, of, strings, whose, exact, content, is, of, little, interest) ids - match(x, x) ids # [1] 1 2 3 4 5 6 7 8 3 10 11 By using this trick, many manipulations on character vectors can be replaced by manipulations on integer vectors, which are sometimes way more efficient. Hm. I hadn't thought of that approach - I use the as.numeric(factor(...)) approach. So I was curious, and compared the two: set.seed(43) x - sample(letters, 1, replace=TRUE) system.time({ for(i in seq_len(2)) { ids1 - match(x, x) }}) # user system elapsed # 9.657 0.000 9.657 system.time({ for(i in seq_len(2)) { ids2 - as.numeric(factor(x, levels=letters)) }}) # user system elapsed # 6.160.006.16 Using factor() is faster. That's an unfair comparison, because you already know what the levels are so you can supply them to your call to factor(). Most of the time you don't know what the levels are so either you just do factor(x) and let the factor() constructor compute the levels for you, or you compute them yourself upfront with something like factor(x, levels=unique(x)). library(microbenchmark) microbenchmark( {ids1 - match(x, x)}, {ids2 - as.integer(factor(x, levels=letters))}, {ids3 - as.integer(factor(x))}, {ids4 - as.integer(factor(x, levels=unique(x)))} ) Unit: microseconds expr min lq { ids1 - match(x, x) } 245.979 262.2390 { ids2 - as.integer(factor(x, levels = letters)) } 214.115 219.2320 { ids3 - as.integer(factor(x)) } 380.782 388.7295 { ids4 - as.integer(factor(x, levels = unique(x))) } 332.250 342.6630 mean median uq max neval 267.3210 264.4845 268.348 293.894 100 226.9913 220.9870 226.147 314.875 100 402.2242 394.7165 412.075 481.410 100 349.7405 345.3090 353.162 383.002 100 More importantly, using factor() lets you set the order of the indices in an expected fashion, where match() assigns them in the order of occurrence. head(data.frame(x, ids1, ids2)) x ids1 ids2 1 m1 13 2 x2 24 3 b32 4 s4 19 5 i59 6 o6 15 In a problem like Kate's where there are several columns for which the same ordering of indices is desired, that becomes really important. I'm not sure why which particular ID gets assigned to each string would matter but maybe I'm missing something. What really matters is that each string receives a unique ID. match(x, x) does that. In Kate's problem, where the strings are in more than one column, and you want the ID to be unique across the columns, you need to do match(x, x) where 'x' contains the strings from all the columns that you want to replace: m - matrix(c( X0001, BYX859,0,0, 2, 1, BYX859, X0001, BYX894,0,0, 1, 1, BYX894, X0001, BYX862, BYX894, BYX859, 2, 2, BYX862, X0001, BYX863, BYX894, BYX859, 2, 2, BYX863, X0001, BYX864, BYX894, BYX859, 2, 2, BYX864, X0001, BYX865, BYX894, BYX859, 2, 2, BYX865 ), ncol=7, byrow=TRUE) x - m[ , 2:4] id - match(x, x, nomatch=0, incomparables=0) m[ , 2:4] - id No factor needed. No loop needed. ;-) Cheers, H. If you take Bill Dunlap's modification of the match() approach, it resolves both problems: matching against the pooled unique values is both faster than the factor() version and gives the same result: On Fri, May 29, 2015 at 1:31 PM, William Dunlap wdun...@tibco.com wrote: match() will do what you want. E.g., run your data through the following function. f - function (data) { uniqStrings - unique(c(data[, 2], data[, 3], data[, 4])) uniqStrings - setdiff(uniqStrings, 0) for (j in 2:4) { data[[j]] -
Re: [R] about transforming a data.frame
Hi Sarah, thank you for your help. I have simplified the example, by reading the elements in a data frame, eg : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) I have used the the count() in the plyr package : count_row_names - count(df$row_names) count_col_names - count(df$col_names) however, I would need to correlate these UNIQUE ELEMENTS in the columns row_names or col_names with the numbers they associate in the CT columns, eg : B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12 associate with 2 (in the CT column). thank you very much, bogdan On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, Please use dput() to provide your data, as it can get somewhat mangled by copy and pasting, especially if you post in HTML (as you are asked not to do in the posting guide). What is a unique element? is B4:B5:B6 an element, or are B4 and B5 each elements? That is, what is the result you expect to obtain for the sample data you provided? What code have you tried? I would think table() might be involved, and possibly strsplit(), but will refrain from putting more time into this until you provide a reproducible dataset with dput() and some clearer idea of your intent. Sarah On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] about transforming a data.frame
Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting unique strings to unique numbers
Hi Sarah, On 05/29/2015 12:04 PM, Sarah Goslee wrote: On Fri, May 29, 2015 at 2:16 PM, Hervé Pagès hpa...@fredhutch.org wrote: Hi Kate, I found that matching the character vector to itself is a very effective way to do this: x - c(a, bunch, of, strings, whose, exact, content, is, of, little, interest) ids - match(x, x) ids # [1] 1 2 3 4 5 6 7 8 3 10 11 By using this trick, many manipulations on character vectors can be replaced by manipulations on integer vectors, which are sometimes way more efficient. Hm. I hadn't thought of that approach - I use the as.numeric(factor(...)) approach. So I was curious, and compared the two: set.seed(43) x - sample(letters, 1, replace=TRUE) system.time({ for(i in seq_len(2)) { ids1 - match(x, x) }}) # user system elapsed # 9.657 0.000 9.657 system.time({ for(i in seq_len(2)) { ids2 - as.numeric(factor(x, levels=letters)) }}) # user system elapsed # 6.160.006.16 Using factor() is faster. That's an unfair comparison, because you already know what the levels are so you can supply them to your call to factor(). Most of the time you don't know what the levels are so either you just do factor(x) and let the factor() constructor compute the levels for you, or you compute them yourself upfront with something like factor(x, levels=unique(x)). library(microbenchmark) microbenchmark( {ids1 - match(x, x)}, {ids2 - as.integer(factor(x, levels=letters))}, {ids3 - as.integer(factor(x))}, {ids4 - as.integer(factor(x, levels=unique(x)))} ) Unit: microseconds expr min lq { ids1 - match(x, x) } 245.979 262.2390 { ids2 - as.integer(factor(x, levels = letters)) } 214.115 219.2320 { ids3 - as.integer(factor(x)) } 380.782 388.7295 { ids4 - as.integer(factor(x, levels = unique(x))) } 332.250 342.6630 mean median uq max neval 267.3210 264.4845 268.348 293.894 100 226.9913 220.9870 226.147 314.875 100 402.2242 394.7165 412.075 481.410 100 349.7405 345.3090 353.162 383.002 100 More importantly, using factor() lets you set the order of the indices in an expected fashion, where match() assigns them in the order of occurrence. head(data.frame(x, ids1, ids2)) x ids1 ids2 1 m1 13 2 x2 24 3 b32 4 s4 19 5 i59 6 o6 15 In a problem like Kate's where there are several columns for which the same ordering of indices is desired, that becomes really important. I'm not sure why which particular ID gets assigned to each string would matter but maybe I'm missing something. What really matters is that each string receives a unique ID. match(x, x) does that. In Kate's problem, where the strings are in more than one column, and you want the ID to be unique across the columns, you need to do match(x, x) where 'x' contains the strings from all the columns that you want to replace: m - matrix(c( X0001, BYX859,0,0, 2, 1, BYX859, X0001, BYX894,0,0, 1, 1, BYX894, X0001, BYX862, BYX894, BYX859, 2, 2, BYX862, X0001, BYX863, BYX894, BYX859, 2, 2, BYX863, X0001, BYX864, BYX894, BYX859, 2, 2, BYX864, X0001, BYX865, BYX894, BYX859, 2, 2, BYX865 ), ncol=7, byrow=TRUE) x - m[ , 2:4] id - match(x, x, nomatch=0, incomparables=0) m[ , 2:4] - id No factor needed. No loop needed. ;-) Cheers, H. If you take Bill Dunlap's modification of the match() approach, it resolves both problems: matching against the pooled unique values is both faster than the factor() version and gives the same result: On Fri, May 29, 2015 at 1:31 PM, William Dunlap wdun...@tibco.com wrote: match() will do what you want. E.g., run your data through the following function. f - function (data) { uniqStrings - unique(c(data[, 2], data[, 3], data[, 4])) uniqStrings - setdiff(uniqStrings, 0) for (j in 2:4) { data[[j]] - match(data[[j]], uniqStrings, nomatch = 0L) } data } ## y - data.frame(id = 1:5000, v1 = sample(letters, 5000, replace=TRUE), v2 = sample(letters, 5000, replace=TRUE), v3 = sample(letters, 5000, replace=TRUE), stringsAsFactors=FALSE) system.time({ for(i in seq_len(2)) { ids3 - f(data.frame(y)) }}) # user system elapsed # 22.515 0.000 22.518 ff - function(data) { uniqStrings - unique(c(data[, 2], data[, 3], data[, 4])) uniqStrings - setdiff(uniqStrings, 0) for (j in 2:4) { data[[j]] - as.numeric(factor(data[[j]], levels=uniqStrings)) } data } system.time({ for(i in seq_len(2)) { ids4 - ff(data.frame(y)) }}) #user system elapsed # 26.083 0.002 26.090 head(ids3) id v1 v2 v3 1 1 1 2 8 2 2 2 19 22 3 3 3 21 16 4 4 4 10 17 5 5 1 8 18 6 6 1 12 26 head(ids4) id v1 v2 v3 1 1 1 2 8 2 2 2 19
Re: [R] about transforming a data.frame
Hi, Please use dput() to provide your data, as it can get somewhat mangled by copy and pasting, especially if you post in HTML (as you are asked not to do in the posting guide). What is a unique element? is B4:B5:B6 an element, or are B4 and B5 each elements? That is, what is the result you expect to obtain for the sample data you provided? What code have you tried? I would think table() might be involved, and possibly strsplit(), but will refrain from putting more time into this until you provide a reproducible dataset with dput() and some clearer idea of your intent. Sarah On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Automatically updating a plot from a regularly updated data file
Hi all, I have a question about using R in a way that may not be correct but I thought I would ask anyway. I have an instrument that outputs a text file with comma separated data. A new line is added to the file each time the instrument takes a new reading. Is there any way to configure R such that a script to generate a plot from said text file is re-run each time the file is modified (i.e. a new line is added). So basically update an exported plot each time a new line of data is collected. Is this type of thing possible in R? If not can anyone recommend some Windows (or Linux if need be) tools that could help me accomplish this preferably still utilizing R's plotting capabilites? I know that there are other tools that can do this all but nothing makes figures as nicely as R. I suppose more generally this is a question about way to automate processes with R to take advantage of R's functionality. Thanks in advance. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting unique strings to unique numbers
match() will do what you want. E.g., run your data through the following function. f - function (data) { uniqStrings - unique(c(data[, 2], data[, 3], data[, 4])) uniqStrings - setdiff(uniqStrings, 0) for (j in 2:4) { data[[j]] - match(data[[j]], uniqStrings, nomatch = 0L) } data } Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, May 29, 2015 at 9:58 AM, Kate Ignatius kate.ignat...@gmail.com wrote: I have a pedigree file as so: X0001 BYX859 0 0 2 1 BYX859 X0001 BYX894 0 0 1 1 BYX894 X0001 BYX862 BYX894 BYX859 2 2 BYX862 X0001 BYX863 BYX894 BYX859 2 2 BYX863 X0001 BYX864 BYX894 BYX859 2 2 BYX864 X0001 BYX865 BYX894 BYX859 2 2 BYX865 And I was hoping to change all unique string values to numbers. That is: BYX859 = 1 BYX894 = 2 BYX862 = 3 BYX863 = 4 BYX864 = 5 BYX865 = 6 But only in columns 2 - 4. Essentially I would like the data to look like this: X0001 1 0 0 2 1 BYX859 X0001 2 0 0 1 1 BYX894 X0001 3 2 1 2 2 BYX862 X0001 4 2 1 2 2 BYX863 X0001 5 2 1 2 2 BYX864 X0001 6 2 1 2 2 BYX865 Is this possible with factors? Thanks! K. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with nls
AFAICS this has essentially nothing to do with R. Please post elsewhere, e.g. on a statistics list like stats.stackexchange.com. Cheers, Bert On Fri, May 29, 2015 at 6:44 AM, Abolfazl Saghafi abolfazl.sagh...@gmail.com wrote: Can some help me with a question on this bass model, please As I read some articles on this topic, I understand that 1. the bass formula is N(t) = pm + (q-p) N(t-1) - (q/m) (N(t-1))^2 2. which is a difference equation with the solution N(t) = m (1 − exp(−(p+q)t)) / (1 + (q/p)exp(−(p+q)t)) 3. So, using a linear regression would give us some some initial estimations for the parameters m, p, q 4. we then can put the initial estimations into a NLS to get the better estimations Am I right? Now the question is, why is that I see people use cumulative data and try to fit it into a pdf as M * ( ((P+Q)^2 / P) * exp(-(P+Q) * T79) ) / (1+(Q/P)*exp(-(P+Q)*T79))^2, why not using the cumulative data and fit directly the N(t) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why I am not able to load library(R.matlab)? Other packages are fine.
Wow, thanks Ben. That worked very well. I guess I didn't have R.methodS3? But that doesn't make sense, because I was using R.matlab few weeks ago. I believe I was on R 3.1. Maybe it's in R 3.1 folder? I am using a Mac, btw. Cheers, -M On Fri, May 29, 2015 at 1:55 PM, Ben Bolker bbol...@gmail.com wrote: C W tmrsg11 at gmail.com writes: Hi Henrik, I don't quite get what I should do here. I am not familiar with R.methodS3. Can you tell me what command exactly do I need to do? Thanks, Mike install.packages(R.methodsS3) install.packages(R.matlab) library(R.matlab) [snip snip snip] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about transforming a data.frame
Bogdan, the request was for data in dput() format. Type ?dput for more information. Do dput(myfile) copy the ouput and paste into the email You should get something like: structure(list(c1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c((0.509,0.614], (0.614,0.718], (0.718,0.822], (0.822,0.926], (0.926,1.03], (1.03,1.13], (1.13,1.24], (1.24,1.34], (1.34,1.45], (1.45,1.55] ), class = factor), s1 = c(0.51, 0.52, 0.58, 0.58, 0.59, 0.6, 0.63, 0.65, 0.68, 0.74, 0.74, 0.75, 0.77, 0.77, 0.77, 0.78, 0.79, 0.84, 0.84, 0.85, 0.87, 0.93, 0.93, 0.95, 0.99, 1.04, 1.09, 1.11, 1.13, 1.14, 1.14, 1.14, 1.17, 1.18, 1.19, 1.22, 1.22, 1.23, 1.28, 1.29, 1.3, 1.32, 1.37, 1.38, 1.38, 1.4, 1.43, 1.47, 1.52, 1.55 )), .Names = c(c1, s1), row.names = c(NA, -50L), class = data.frame) Data in duput() format is the preferred way to get data in R-help since it provides a perfect copy of what you have on your machine. Any other way of providing data risks the recipients reading it into R differently than it is on your machine. John Kane Kingston ON Canada -Original Message- From: tan...@gmail.com Sent: Fri, 29 May 2015 13:58:20 -0700 To: sarah.gos...@gmail.com Subject: Re: [R] about transforming a data.frame Hi Sarah, thank you for your help. I have simplified the example, by reading the elements in a data frame, eg : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) I have used the the count() in the plyr package : count_row_names - count(df$row_names) count_col_names - count(df$col_names) however, I would need to correlate these UNIQUE ELEMENTS in the columns row_names or col_names with the numbers they associate in the CT columns, eg : B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12 associate with 2 (in the CT column). thank you very much, bogdan On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, Please use dput() to provide your data, as it can get somewhat mangled by copy and pasting, especially if you post in HTML (as you are asked not to do in the posting guide). What is a unique element? is B4:B5:B6 an element, or are B4 and B5 each elements? That is, what is the result you expect to obtain for the sample data you provided? What code have you tried? I would think table() might be involved, and possibly strsplit(), but will refrain from putting more time into this until you provide a reproducible dataset with dput() and some clearer idea of your intent. Sarah On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why I am not able to load library(R.matlab)? Other packages are fine.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I think Henrik's point (which I merely clarified) was that something funky (we'll probably never know what, and it's not worth figuring out unless it happens again/to other people) had gone wrong and that the easiest thing to do was just to reinstall. References: * https://www.youtube.com/watch?v=t2F1rFmyQmY * http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.208.9970rep=rep1type=pdf On 15-05-29 05:11 PM, C W wrote: Wow, thanks Ben. That worked very well. I guess I didn't have R.methodS3? But that doesn't make sense, because I was using R.matlab few weeks ago. I believe I was on R 3.1. Maybe it's in R 3.1 folder? I am using a Mac, btw. Cheers, -M On Fri, May 29, 2015 at 1:55 PM, Ben Bolker bbol...@gmail.com wrote: C W tmrsg11 at gmail.com writes: Hi Henrik, I don't quite get what I should do here. I am not familiar with R.methodS3. Can you tell me what command exactly do I need to do? Thanks, Mike install.packages(R.methodsS3) install.packages(R.matlab) library(R.matlab) [snip snip snip] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJVaNXMAAoJEOCV5YRblxUHj6kH/3W3etyn+HlT0X1PEj7DQf2c Qo0q9ed2csPRLbLLrpX2FPKbxLg/g6MSxmIQ118tbWhkzKfRoyxCZHLcT+U2xLuR V7QAS3Yns2ENSSSH1GvdSeFZTQWW3XFZN/kT+/zQYjaZewZOlo4Cgqc16c6mGBRS eSIRIyA3iJWnMEc878nbMJztvsEqnpZSNSIXiI91UX/l8sDrBNYCNtfzY86JqJhp 8O0q7zkaRIrb6UuViY3qTC5+qpGruUYIUbeqyNei7MNErrG3AufsODfs5d/CjSCa 5jlbS512JRrQFV2JKHU+AH+4Q9CJQBVS+F6JZdjhHB2fUmAx0XIR6IJEBfSvBSk= =nO+b -END PGP SIGNATURE- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automatically updating a plot from a regularly updated data file
A lot will depend on how frequently data is added to the file, how big the file gets, and how important it is to see updated plots quickly. I have R doing exactly what you describe, and have found logic like this (which might be described as crude) to be sufficient while( {some condition} ) { {read the data file} {make the plot} Sys.sleep( {some number of seconds} ) } Of course this is not actually noticing that the file has changed and responding, it is just updating at regular intervals. But that might be good enough. A slightly more sophisticated approach would be to set up a loop like the above, and have the sleep time short, but within the loop use file.info({the csv file}) and when the modification time is later than the previous modification time, read the data and update the plot. If the file gets really big, you might not want to reload the entire file each time. That might lead you into things like keeping track of how many lines the file has, and only reading the new lines -- if you need your plots to be cumulative. In that situation you might end up using the pipe() function to create your connection to the file, and pass the OS's 'tail' command (Linux or Mac, not sure about Win) to pipe. If you only need to plot the last, say, X hours of data, then you may not need to keep track of the number of lines, just read the last N lines (hopefully not too hard to figure out what N should be). If you don't want an R process running indefinitely, as is the case for the above, you can, on Linux and Mac, set up a cron job to run an R script as often as once per minute. I have at least one such task where it happens every 2 minutes, and makes plots of the current data. In this case, we have 16 measurement devices each sending data to a MySQL database once per minute; the R script pulls the data from the database every 2 minutes and plots, and the system works well for our needs. Windows will have some equivalent to cron, I just don't know what it is. FWIW, all of the above write png files which are viewed via a webserver. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 5/29/15, 12:51 PM, Sam Albers tonightstheni...@gmail.com wrote: Hi all, I have a question about using R in a way that may not be correct but I thought I would ask anyway. I have an instrument that outputs a text file with comma separated data. A new line is added to the file each time the instrument takes a new reading. Is there any way to configure R such that a script to generate a plot from said text file is re-run each time the file is modified (i.e. a new line is added). So basically update an exported plot each time a new line of data is collected. Is this type of thing possible in R? If not can anyone recommend some Windows (or Linux if need be) tools that could help me accomplish this preferably still utilizing R's plotting capabilites? I know that there are other tools that can do this all but nothing makes figures as nicely as R. I suppose more generally this is a question about way to automate processes with R to take advantage of R's functionality. Thanks in advance. Sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about transforming a data.frame
Hi John, thanks for clarifications, yes, of course, the dput() output is the following : dput(dataframe_matches_ddCT) structure(list(FIGURE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = test, class = factor), ddCT = c(-5.4595, -2.7467, -2.7467, -2.7467, -2.7467, -2.7467, -4.5927, -4.5927), row_names = structure(c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L), .Label = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12 ), class = factor), col_names = structure(c(1L, 1L, 2L, 3L, 4L, 5L, 6L, 1L), .Label = c(B1:B2:B3, H4:H5:H6, F10:F11:F12, H1:H2:H3, G7:G8:G9, D1:D2:D3), class = factor), CTaverage_MATRIX_SUBSTRACTIONS = c(-5.4595413208, -2.7467829387, -2.74099286393334, -2.7433134714, -2.7480595907, -2.755259196, -4.59402211506667, -4.5927206675)), .Names = c(FIGURE, ddCT, row_names, col_names, CTaverage_MATRIX_SUBSTRACTIONS ), row.names = c(NA, 8L), class = data.frame) thanks again for your input, -- bogdan On Fri, May 29, 2015 at 2:11 PM, John Kane jrkrid...@inbox.com wrote: Bogdan, the request was for data in dput() format. Type ?dput for more information. Do dput(myfile) copy the ouput and paste into the email You should get something like: structure(list(c1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c((0.509,0.614], (0.614,0.718], (0.718,0.822], (0.822,0.926], (0.926,1.03], (1.03,1.13], (1.13,1.24], (1.24,1.34], (1.34,1.45], (1.45,1.55] ), class = factor), s1 = c(0.51, 0.52, 0.58, 0.58, 0.59, 0.6, 0.63, 0.65, 0.68, 0.74, 0.74, 0.75, 0.77, 0.77, 0.77, 0.78, 0.79, 0.84, 0.84, 0.85, 0.87, 0.93, 0.93, 0.95, 0.99, 1.04, 1.09, 1.11, 1.13, 1.14, 1.14, 1.14, 1.17, 1.18, 1.19, 1.22, 1.22, 1.23, 1.28, 1.29, 1.3, 1.32, 1.37, 1.38, 1.38, 1.4, 1.43, 1.47, 1.52, 1.55 )), .Names = c(c1, s1), row.names = c(NA, -50L), class = data.frame) Data in duput() format is the preferred way to get data in R-help since it provides a perfect copy of what you have on your machine. Any other way of providing data risks the recipients reading it into R differently than it is on your machine. John Kane Kingston ON Canada -Original Message- From: tan...@gmail.com Sent: Fri, 29 May 2015 13:58:20 -0700 To: sarah.gos...@gmail.com Subject: Re: [R] about transforming a data.frame Hi Sarah, thank you for your help. I have simplified the example, by reading the elements in a data frame, eg : df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6, D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12), col_names = c (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3), CT = c(5,2,2,2,2,2,4,4) ) I have used the the count() in the plyr package : count_row_names - count(df$row_names) count_col_names - count(df$col_names) however, I would need to correlate these UNIQUE ELEMENTS in the columns row_names or col_names with the numbers they associate in the CT columns, eg : B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12 associate with 2 (in the CT column). thank you very much, bogdan On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, Please use dput() to provide your data, as it can get somewhat mangled by copy and pasting, especially if you post in HTML (as you are asked not to do in the posting guide). What is a unique element? is B4:B5:B6 an element, or are B4 and B5 each elements? That is, what is the result you expect to obtain for the sample data you provided? What code have you tried? I would think table() might be involved, and possibly strsplit(), but will refrain from putting more time into this until you provide a reproducible dataset with dput() and some clearer idea of your intent. Sarah On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote: Dear all, I would appreciate a suggestion on the following : I am working with a data.frame (below) : EXPCT row_names col_names 1 test -5B4:B5:B6B1:B2:B3 2 test -2B7:B8:B9B1:B2:B3 3 test -2D4:D5:D6H4:H5:H6 4 test -2D10:D11:D12 F10:F11:F12 5 test -2D10:D11:D12H1:H2:H3 6 test -2E10:E11:E12G7:G8:G9 7 test -4 A1:A2:A3D1:D2:D3 8 test -4 B10:B11:B12B1:B2:B3 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES or the UNIQUE elements in the COL_NAMES and : print how many times these UNIQUE ELEMENTS associate with the numbers -5, -2, or -4 (these numbers are on the column names CT) .. thanks, bogdan -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing