date:20150529

Re: [R-es] La ejecución de mi script R es muy lenta

2015-05-29 Thread MªLuz Morales

Hola Miguel Ángel,
creo que Carlos Ortega me ha dado una solución a mi problema con R...voy a
probarlo... No sabía que había esa limitación en el tamaño del email, lo
tendré en cuenta para la próxima.

Muchas gracias en cualquier caso
Un saludo

MªLuz Morales
Dpto. Ciencias y Tecnología de la comunicación
Universidad Europea de Madrid

El 28 de mayo de 2015, 22:29, miguel.angel.rodriguez.mui...@sergas.es
escribió:

 Hola Mª Luz.

 Tu primer mensaje no ha llegado a la lista precisamente por el tamaño de
 los ficheros adjuntos. Tienes un correo del administrador al respecto.
 Al haber comentado tú misma ese mensaje, lo hemos podido leer todos pero
 no tenemos acceso al fichero Set-A.zip y al Outcomes.csv. (creo recordar
 que eran unos 9Mb entre los dos)
 Podrías pensar en colgarlos en algún sitio (tipo DropBox o similar) y
 compartir la URL. En caso de que tengas problemas envíame un correo e
 intentaré ayudarte.


 Un Saludo,
 Miguel Rodríguez
 Consellería de Sanidade
 Xunta de Galicia
 http://dxsp.sergas.es


 
 De: R-help-es [r-help-es-boun...@r-project.org] en nombre de MªLuz
 Morales [mlzm...@gmail.com]
 Enviado: jueves, 28 de mayo de 2015 16:14
 Para: Carlos Ortega
 CC: R-help-es@r-project.org
 Asunto: Re: [R-es] La ejecución de mi script R es muy lenta

 Hola,
 gracias por contestar tan rápido.
 En el email he adjuntado los archivos seta y outcomes.csv, no me queda
 claro como hacer para que podáis acceder a ellos de otra manera.

 El 28 de mayo de 2015, 15:53, Carlos Ortega c...@qualityexcellence.es
 escribió:

  Hola,
 
  Si no tienes inconveniente en compartir tu conjunto de datos (puedes
  dejarlo en un Dropbox y compartir enlace) o incluir una salida de la
  variables: seta y outcomes (función save.image()) con eso podemos
  darte alguna solución mucho más rápida que la que planteas.
 
  En tu código con un bucle estás tratando de rellenar una lista que son
 los
  diferentes agregados y esto se puede hacer mucho más rápido (segundos)
 con
  varios paquetes: data.table, dplyr y sqldf.
 
 
  Saludos,
  Carlos Ortega
  www.qualityexcellence.es
 
  El 28 de mayo de 2015, 15:34, javier.ruben.marcu...@gmail.com
 escribió:
 
  Estimada María Luz Morales
 
 
  Puedes intentar con data.table y reemplazar for por algina otra opción
  vectorizada, aunque en R moderno esto mejoró, y la posibilidad de
 compile
  debería ser evaluada.
 
 
 
 
 
 
  Javier Rubén Marcuzzi
  Técnico en Industrias Lácteas
  Veterinario
 
 
 
 
 
  De: MªLuz Morales
  Enviado el: ‎jueves‎, ‎28‎ de ‎mayo‎ de ‎2015 ‎10‎:‎21‎ ‎a.m.
  Para: R-help-es@r-project.org
 
 
 
 
 
  En el correo anterior se me olvidó mencionar que trabajo con Rstudio
 
  El 28 de mayo de 2015, 15:18, MªLuz Morales mlzm...@gmail.com
 escribió:
 
   Hola,
   soy nueva en esta lista y también en R. Yo he realizado un  script en
 R
   que  carga dos archivos csv, uno de ellos con casi 2 millones de
 filas.
  El
   programa carga esos archivos a data frame, y se trata simplemente de
   seleccionar ciertos datos, hacer alguna operación (media, minimo,
  máximo)
   y  presentarlos en una tabla que tendrá 4000 filas. La ejecución de
 este
   programa ha tardado casi 3 horas, podéis decirme si R es lento en
  esta
   operación o es que mi código no está optimizado y no estoy haciéndolo
  de la
   forma correcta.
   El código de mi programa es el siguiente:
  
  
  
 
 #+++
   ## Set-A.csv y Outcomes.csv deben estar en el directorio actual
   #  Transforma csv a data frame
   seta - read.csv('Set-A.csv');
   outcomes - read.csv('Outcomes-A.csv');
  
   ids - as.character(unique(outcomes$RecordID));
   ## Número de RecordsID distintos
   Length_ids - length(ids); #número de RecordsID distintos
   ListaABP - list('RecordID'=-1,'SAPS.I'=-1, 'SOFA'=-1, 'Survival'=-1,
   'In.hospital_death'=-1, 'NISysABP_Min'=-1,'NISysABP_Max'=-1,
   'NISysABP_Mean'=-1, 'NIDiasABP_Min'=-1,'NIDiasABP_Max'=-1,
   'NIDiasABP_Mean'=-1,'NIMAP_Min'=-1,'NIMAP_Max'=-1, 'NIMAP_Mean'=-1);
   for (i in 1:Length_ids){#NumRecordID){   # Para cada paciente...
  
 ListaABP$RecordID[i] - outcomes$RecordID[i];
 ListaABP$SAPS.I[i] - outcomes$SAPS.I[i];
 ListaABP$SOFA[i] - outcomes$SOFA[i];
 ListaABP$Survival[i] - outcomes$Survival[i];
 ListaABP$In.hospital_death[i] - outcomes$In.hospital_death[i];
  
 # Parameter == 'NISysBP'
 #seta_NISysABP - seta[seta$RecordID == ids[i]  seta$Parameter ==
   'NISysABP' , c('RecordID','Value')] ;
 seta_NISysABP - seta[seta$RecordID == ids[i]  seta$Parameter ==
   'NISysABP' , 'Value'] ; #Creo que esto ya no sería un dataframe, por
 lo
  que
   en la siguiente línea puede dar error
 ListaABP$NISysABP_Min[i] - min(seta_NISysABP);
 ListaABP$NISysABP_Max[i] - max(seta_NISysABP);
 ListaABP$NISysABP_Mean[i] - mean(seta_NISysABP);
  
 # Parameter == 'NIDiasABP'
 #seta_NIDiasABP - seta[seta$RecordID

[R] Error in CSV file

2015-05-29 Thread Shivi82

Hello All,
This is an easy fix but I am not able to find the root cause of the error. I
am trying to upload a csv file but it is throwing an error.
Have done a lot of research on google and some tutorial but cant find a
solution hence please advice:-
Syntax is :-   aaa-read.csv(file =VehicleData.csv,Header=TRUE)

Error:- Error in read.table(file = file, header = header, sep = sep, quote =
quote,  : 
  unused argument (Header = TRUE)

Snapshot of the file:-
Weight  Hours   PROCESS Month   Weekday Day
6828 13 INBOUND Mar   Fri   13
2504 16 INBOUND Mar   Fri   27
20   16 INBOUND Mar   Fri   27
1026216 INBOUND Mar   Fri   27
2500 17 INBOUND Mar   Fri   13

Kindly help. 




--
View this message in context: 
http://r.789695.n4.nabble.com/Error-in-CSV-file-tp4707879.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] analysis of variance test

2015-05-29 Thread Michael Dewey


Dear Nezahat
In future it would be helpful if you

1 - gave us the data so we can reproduce what you are doing
2 - told us what the error was in case we cannot replicate ti
3 - did not post in HTML as it messes up everything in your post

What did you think x1 - numeric was going to do?
Try
x1 - numeric
str(x1)


On 28/05/2015 22:16, Nezahat HUnter wrote:


Let's say I have 12 observation of 5 variables and my first variable is categorical (with 
4 different levels). I am trying to find out statistical significance difference between 
these categorical levels for each variable, but my  function is not working! Please note 
that my data x are in data.frame format.
Any suggestion would be helpful.Many thanks.

function(x)
{
 x1 - numeric
 x2 - numeric
 for(i in 2:length(x)) {
 x1[i] - summary(aov(x[, i] ~ factor(x[, 1])))
 x2[i] - x1[i]$Pr[1]  #Pr is the probability values
 if(x2[i]  0.06)
 x2[i] - 1
 else x2[i] - 0
 }
 x2
}




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help on R Functionality Histogram

2015-05-29 Thread Shivi82

Hello Experts, 
I have couple of questions on the analysis I am creating.
1) How does R adopt to changes. The case I have here is that the excel I
have started initially had to be modified because the data I had was on
hourly basis ranging from 0 to 23 hours. After Changes 0 was modified to 24
in hours. Now do I need to recall this excel again in R using read.csv
syntax or is there another way to do so i.e. a kind of reload option
2) I am creating a histogram. I need on x axis 24 hours to be displayed
separately as 0,1,2, and thereon. However it only shows till 20 which makes
the look awkward. Also all l need to resize the labels and if possible
inside the bars. It used the below code, axis fonts have changed but labels
give an error with this code

Code:- hist(aaa$Hours,main=Hourly Weight,xlab = Time,breaks = 25,col =
yellow,ylim = c(0,9000),
 labels=TRUE, cex.axis=0.6,cex.label=0.6)

Kindly advice on the both the questions. Thanks. 






--
View this message in context: 
http://r.789695.n4.nabble.com/Help-on-R-Functionality-Histogram-tp4707887.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] analysis of variance test

2015-05-29 Thread Jim Lemon

Hi Nezahat,
First, you are storing the code of the function numeric in x1 and
x2. You probably want to use:

x1-numeric()
x2-numeric()

Second, you are then storing the output of your aov summary (a list)
in x1, which requires a bit of analysis to get the information you
want (i.e. p value). The following will work for your example, but is
not a general solution.

nh_fun-function(x) {
pvals -numeric()
for(i in 2:length(x))
pvals[i-1]-unlist(summary(aov(x[,i] ~
factor(x[,1])))[[1]][5])[1] = 0.05
return(pvals)
}

nh_fun(x)

As you probably want to get the conventional =0.05, I have changed
the criterion. If you want to understand why the mess of extractors
appears after the summary call, use the str function successively
on the return value from summary

Jim


On Fri, May 29, 2015 at 7:16 AM, Nezahat HUnter
nezahathun...@yahoo.co.uk wrote:

 Let's say I have 12 observation of 5 variables and my first variable is 
 categorical (with 4 different levels). I am trying to find out statistical 
 significance difference between these categorical levels for each variable, 
 but my  function is not working! Please note that my data x are in 
 data.frame format.
 Any suggestion would be helpful.Many thanks.

 function(x)
 {
 x1 - numeric
 x2 - numeric
 for(i in 2:length(x)) {
 x1[i] - summary(aov(x[, i] ~ factor(x[, 1])))
 x2[i] - x1[i]$Pr[1]  #Pr is the probability values
 if(x2[i]  0.06)
 x2[i] - 1
 else x2[i] - 0
 }
 x2
 }




 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to make new predictions from a GAM with a spline forced through the origin

2015-05-29 Thread Gavan McGrath

Hi,

I’m followed an example to fit a GAM with a spline forced through a point, i.e. 
(0,0). This works fine from one of Simon’s examples however when it comes to 
making a prediction from a new set of x values I’m a bit stumped.

In the example below a smooth term is constructed and the basis and penalties 
at x=0 are removed then the gam is fitted to a spline basis matrix X using 
spline penalties.

Can someone suggest a way that I can make predictions at new  x  values based 
on the gam b below.


Here is Simon Wood's example:

library(mgcv)
set.seed(0)
n - 100
x - runif(n)*4-1;x - sort(x);
f - exp(4*x)/(1+exp(4*x));y - f+rnorm(100)*0.1;plot(x,y)
dat - data.frame(x=x,y=y)

## Create a spline basis and penalty, making sure there is a knot
## at the constraint point, (0 here, but could be anywhere)
knots - data.frame(x=seq(-1,3,length=9)) ## create knots
## set up smoother...
sm - smoothCon(s(x,k=9,bs=cr),dat,knots=knots)[[1]]

## 3rd parameter is value of spline at knot location 0,
## set it to 0 by dropping...
X - sm$X[,-3]## spline basis
S - sm$S[[1]][-3,-3] ## spline penalty
off - y*0 + .6   ## offset term to force curve through (0, .6)

## fit spline constrained through (0, .6)...
b - gam(y ~ X - 1 + offset(off),paraPen=list(X=list(S)))
lines(x,predict(b))



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help on R Functionality Histogram

2015-05-29 Thread Shivi82

Hello Experts, 
I have couple of questions on the analysis I am creating.
1) How does R adopt to changes. The case I have here is that the excel I
have started initially had to be modified because the data I had was on
hourly basis ranging from 0 to 23 hours. After Changes 0 was modified to 24
in hours. Now do I need to recall this excel again in R using read.csv
syntax or is there another way to do so i.e. a kind of reload option
2) I am creating a histogram. I need on x axis 24 hours to be displayed
separately as 0,1,2, and thereon. However it only shows till 20 which makes
the look awkward. Also all l need to resize the labels and if possible
inside the bars. It used the below code, axis fonts have changed but labels
give an error with this code

Code:- hist(aaa$Hours,main=Hourly Weight,xlab = Time,breaks = 25,col =
yellow,ylim = c(0,9000),
 labels=TRUE, cex.axis=0.6,cex.label=0.6)

Kindly advice on the both the questions. Thanks. 

Histogram.png http://r.789695.n4.nabble.com/file/n4707886/Histogram.png  



--
View this message in context: 
http://r.789695.n4.nabble.com/Help-on-R-Functionality-Histogram-tp4707886.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in CSV file

2015-05-29 Thread Rainer M Krug

Shivi82 shivibha...@ymail.com writes:

 Hello All,
 This is an easy fix but I am not able to find the root cause of the error. I
 am trying to upload a csv file but it is throwing an error.
 Have done a lot of research on google and some tutorial but cant find a
 solution hence please advice:-
 Syntax is :-   aaa-read.csv(file =VehicleData.csv,Header=TRUE)

 Error:- Error in read.table(file = file, header = header, sep = sep, quote =
 quote,  : 
   unused argument (Header = TRUE)
 ^^

use header = TRUE instead of Header = TRUE. R is case sensitive.

Cheers,

Rainer


 Snapshot of the file:-
 WeightHours   PROCESS Month   Weekday Day
 6828   13 INBOUND Mar   Fri   13
 2504   16 INBOUND Mar   Fri   27
 20 16 INBOUND Mar   Fri   27
 10262  16 INBOUND Mar   Fri   27
 2500   17 INBOUND Mar   Fri   13

 Kindly help. 




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Error-in-CSV-file-tp4707879.html
 Sent from the R help mailing list archive at Nabble.com.


-- 
Rainer M. Krug
email: Raineratkrugsdotde
PGP: 0x0F52F982


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in CSV file

2015-05-29 Thread Ivan Calandra


Hi Shivi,

R is case sensitive and the error message that the argument Header is 
unused (because unrecognized). Try with header (lower case h) and it 
should work.


HTH,
Ivan

--
Ivan Calandra, ATER
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
https://www.researchgate.net/profile/Ivan_Calandra

Le 29/05/15 10:41, Shivi82 a écrit :

Hello All,
This is an easy fix but I am not able to find the root cause of the error. I
am trying to upload a csv file but it is throwing an error.
Have done a lot of research on google and some tutorial but cant find a
solution hence please advice:-
Syntax is :-   aaa-read.csv(file =VehicleData.csv,Header=TRUE)

Error:- Error in read.table(file = file, header = header, sep = sep, quote =
quote,  :
   unused argument (Header = TRUE)

Snapshot of the file:-
Weight  Hours   PROCESS Month   Weekday Day
6828 13 INBOUND Mar   Fri   13
2504 16 INBOUND Mar   Fri   27
20   16 INBOUND Mar   Fri   27
1026216 INBOUND Mar   Fri   27
2500 17 INBOUND Mar   Fri   13

Kindly help.




--
View this message in context: 
http://r.789695.n4.nabble.com/Error-in-CSV-file-tp4707879.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in CSV file

2015-05-29 Thread Shivi82

This ate my head like for 2 hours. God thanks for the help. 



--
View this message in context: 
http://r.789695.n4.nabble.com/Error-in-CSV-file-tp4707879p4707882.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on R Functionality Histogram

2015-05-29 Thread Sarah Goslee

On Fri, May 29, 2015 at 7:53 AM, Shivi82 shivibha...@ymail.com wrote:
 Hello Experts,
 I have couple of questions on the analysis I am creating.
 1) How does R adopt to changes. The case I have here is that the excel I
 have started initially had to be modified because the data I had was on
 hourly basis ranging from 0 to 23 hours. After Changes 0 was modified to 24
 in hours. Now do I need to recall this excel again in R using read.csv
 syntax or is there another way to do so i.e. a kind of reload option

Using read.csv() is the reload option. R has no automatic interface to
external files.


 2) I am creating a histogram. I need on x axis 24 hours to be displayed
 separately as 0,1,2, and thereon. However it only shows till 20 which makes
 the look awkward. Also all l need to resize the labels and if possible
 inside the bars. It used the below code, axis fonts have changed but labels
 give an error with this code

 Code:- hist(aaa$Hours,main=Hourly Weight,xlab = Time,breaks = 25,col =
 yellow,ylim = c(0,9000),
  labels=TRUE, cex.axis=0.6,cex.label=0.6)

The most understandable approach is to break it down into chunks:
Create the histogram.
Add a custom axis.
Add custom labels.

# using fake data
aaa - data.frame(Hours = sample(1:24, 1, replace=TRUE))

aaa.hist - hist(aaa$Hours, main=Hourly Weight, xlab = Time,
breaks = seq(0, 24), col = yellow, ylim = c(0,9000), cex.axis=0.6,
xaxt=n)
axis(1, (0:23)+.5, 1:24, cex.axis=.6)
text((0:23)+.5, aaa.hist$counts-150, aaa.hist$counts, cex=.6)

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on R Functionality Histogram

2015-05-29 Thread Shivi82

Thanks Sarah. This is magical. 
Thanks for explaining in such a length. 



--
View this message in context: 
http://r.789695.n4.nabble.com/Help-on-R-Functionality-Histogram-tp4707886p4707891.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] An Odd Request

2015-05-29 Thread Josh Grant

Hello R-Users

I apologize in advance if my post is inappropriate. I read the entire
posting guide and found nothing to say so, but you never know. I am seeking
a knowledgable R-user that might be interested (for whatever reason) in
helping out on what I hope would be considered a worthy project.

I am a research scientist, albeit one with little programming ability. I
recently started a website which allows patients of different sorts to
suggest research studies. Everything is completely free and anonymous. When
several members express interest in a particular idea I attempt to build it
so they can actually run through the study. Clearly there are limits but we
currently we have 4 communities, chronic fatigue syndrome, fibromyalgia,
multiple sclerosis and pernicious anaemia and there are several active
studies in which people are submitting data every day. It's quite exciting
and I think it has great potential to help people, particularly with
disorders that have defied explanation.

I'm currently using google spreadsheets/forms to create symptom trackers
and interactive dashboards of the results which (most of the time) show
group results by default but which can show individual results if an ID is
entered. Unfortunately google spreadsheets is a little limited and I now
require the use of more complicated stats such as linear mixed models.

I know that I need to move to R, I understand the basics of running
statistical tests with packages such as LMER, but I have no clue how to go
about integrating such analyses into a website. I could certainly learn
how, would love to, and ultimately will, but if someone was interested in
joining me in this endeavour much more could be accomplished.

If you're interested in knowing more let me know.

Josh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] An Odd Request

2015-05-29 Thread Charles Determan

If you are primarily interested in making your R analyses in to a website
you should look in to the 'Shiny' package.  It makes generating web pages
very easy.  Here is a link to the Shiny Gallery providing some examples (
http://shiny.rstudio.com/gallery/).

Regards,
Charles

On Fri, May 29, 2015 at 7:48 AM, Josh Grant myencepha...@gmail.com wrote:

 Hello R-Users

 I apologize in advance if my post is inappropriate. I read the entire
 posting guide and found nothing to say so, but you never know. I am seeking
 a knowledgable R-user that might be interested (for whatever reason) in
 helping out on what I hope would be considered a worthy project.

 I am a research scientist, albeit one with little programming ability. I
 recently started a website which allows patients of different sorts to
 suggest research studies. Everything is completely free and anonymous. When
 several members express interest in a particular idea I attempt to build it
 so they can actually run through the study. Clearly there are limits but we
 currently we have 4 communities, chronic fatigue syndrome, fibromyalgia,
 multiple sclerosis and pernicious anaemia and there are several active
 studies in which people are submitting data every day. It's quite exciting
 and I think it has great potential to help people, particularly with
 disorders that have defied explanation.

 I'm currently using google spreadsheets/forms to create symptom trackers
 and interactive dashboards of the results which (most of the time) show
 group results by default but which can show individual results if an ID is
 entered. Unfortunately google spreadsheets is a little limited and I now
 require the use of more complicated stats such as linear mixed models.

 I know that I need to move to R, I understand the basics of running
 statistical tests with packages such as LMER, but I have no clue how to go
 about integrating such analyses into a website. I could certainly learn
 how, would love to, and ultimately will, but if someone was interested in
 joining me in this endeavour much more could be accomplished.

 If you're interested in knowing more let me know.

 Josh

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problems with nls

2015-05-29 Thread Abolfazl Saghafi

Can some help me with a question on this bass model, please

As I read some articles on this topic, I understand that
1. the bass formula is
N(t) = pm + (q-p) N(t-1) - (q/m) (N(t-1))^2
2. which is a difference equation with the solution
N(t) = m (1 − exp(−(p+q)t)) / (1 + (q/p)exp(−(p+q)t))
3. So, using a linear regression would give us some some initial
estimations for the parameters m, p, q
4. we then can put the initial estimations into a NLS to get the better
estimations

Am I right?

Now the question is,
why is that I see people use cumulative data and try to fit it into a pdf as
M * ( ((P+Q)^2 / P) * exp(-(P+Q) * T79) ) / (1+(Q/P)*exp(-(P+Q)*T79))^2,

why not using the cumulative data and fit directly the N(t)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with comparing multiple data sets

2015-05-29 Thread Mohammad Alimohammadi

Hi everyone.

I tried the (modeest) package on my initial test data and it worked.
However, it doesn't work on the entire data set. I saved one of the
protions that gives error. (Not for all of the values but for some of
them). For example: lines 36 and 37 and 39 correctly show the mode value
but 38 and 40 are not correct. Such error is repeated for many of the
values.

[36,] 2
[37,] 2
[38,] Numeric,3
[39,] 1
[40,] Numeric,3



#This is what I did:
 df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,)
 Out- apply(df[,2:length(df)],1, mfv)
 t(t(Out))


#This is the data set

structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L,
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label =
c(#authentication,access control,
#privacy,personal data, #security,malicious,security, data
controller,
id management,security, password,recovery), class = factor),
class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L,
2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms,
class.1, class.2, class.3), class = data.frame, row.names = c(NA,
-50L))



also when I try to include the terms to the result it gives me an error:

 mode.names- data.frame (df[,1],Out)
Error in data.frame(df[, 1], Out) :
arguments imply differing number of rows: 50, 3







On Thu, May 28, 2015 at 9:24 AM, Mohammad Alimohammadi 
mxalimoha...@ualr.edu wrote:

 Thank you David for your help !

 On Wed, May 27, 2015 at 7:31 PM, David L Carlson dcarl...@tamu.edu
 wrote:

  cat(paste0([, 1:length(Out), ] #dac , Out), sep=\n)

  David

 *From:* Mohammad Alimohammadi [mailto:mxalimoha...@ualr.edu]
 *Sent:* Wednesday, May 27, 2015 2:29 PM
 *To:* David L Carlson; r-help@r-project.org

 *Subject:* Re: [R] Problem with comparing multiple data sets



 Thanks David it worked !



 One more thing. I hope it's not complicated. Is it also possible to
 display the terms for each row next to it?



 for example:



 [1] #dac2

 [2] #dac0

 [3] #dac1

 ...









 On Wed, May 27, 2015 at 2:18 PM, David L Carlson dcarl...@tamu.edu
 wrote:

 Save the result of the apply() function:

 Out - apply(df[ ,2:length(df)], 1, mfv)

 Then there are several options:

 Approximately what you asked for
 data.frame(Out)
 t(t(Out))

 More typing but exactly what you asked for
 cat(paste0([, 1:length(Out), ] , Out), sep=\n)


 David L. Carlson
 Department of Anthropology
 Texas AM University



 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mohammad
 Alimohammadi
 Sent: Wednesday, May 27, 2015 1:47 PM
 To: John Kane; r-help@r-project.org
 Subject: Re: [R] Problem with comparing multiple data sets

 Ok. so I read about the (modeest) package that gives the results that I
 am looking for (most repeated value).

 I modified the data frame a little and moved the text to the first column.
 This is the data frame with all 3 possible classes for each term.

 =
 structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac,
 #mac,#security,
 accountability,anonymous, data security,encryption,security
 ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
 class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1,
 class.2, class.3), class = data.frame, row.names = c(NA,
 -49L))
 =
 #Then I applied the function below:

 ==
 library(modeest)
 df- read.csv(file=short.csv,

Re: [R] best way to handle database connections from within a package

2015-05-29 Thread Mark Sharp

I would simply separate the database connect and disconnect functions from the 
query functions. 

Mark
R. Mark Sharp, Ph.D.
msh...@txbiomed.org





 On May 28, 2015, at 12:18 PM, Luca Cerone luca.cer...@gmail.com wrote:
 
 Dear all,
 I am writing a package that is a collection of queries to be run
 against a postgresql database,
 so that the users do not have to worry about the structure of the database.
 
 In my package I import dbDriver, dbUnloadDriver, dbConnect,
 dbDisconnect from the package DBI
 and dbGetQuery from the package RPostgreSQL.
 
 All the function in a function in my package have the same structure:
 
 getFancyData - function( from, to) {
on.exit( dbDisconnect(con), add=TRUE)
on.exit( dbUnloadDriver(drv), add=TRUE)
drv - dbDriver(PostgreSQL)
con - dbConnect(drv,
 user=pkguser,
 host=pkghost,
 password=pkgpassword,
 port = pkgport)
 
query - sprintf(select * from fancyTable where dt between '%s'
 and '%s', from, to)
res - dbGetQuery(con,query)
return(res)
 }
 
 The various access details are read from an encrypted profile that the
 user has to
 create when she installs the package.
 
 Such functions work perfectly fine, but I have to replicate a lot of
 times loading and unloading the driver and connecting and
 disconnecting from the database.
 
 I am wondering if there is a better way to do this job, like loading
 the driver and opening the connection only once when the package is
 loaded. However I have to make sure that
 if R crashes or the code where the function is called contains an
 error then the connection
 with the database is closed. How would you implement this?
 
 
 Also how would you write a functional that would at least allow me to
 avoid replicating
 the boilerplate code to load and unload the drivers?
 
 I am thinking something on the lines of:
 
 querybuild - function(query, )
on.exit( dbDisconnect(con), add=TRUE)
on.exit( dbUnloadDriver(drv), add=TRUE)
query - sprintf(query, ... )
res - dbSendQuery(query)
return(res)
 }
 
 and then define
 
 getFancyData - function(from, to) querybuild(select * from
 fancyTable where dt between '%s' and '%s', from, to)
 
 Do you see a better way?
 
 Thanks a lot in advance for your help and advice on this!
 
 Cheers,
 Luca
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] alternatives to KS test applicable to K-samples

2015-05-29 Thread Wensui Liu

Good morning, All
I have a stat question not specifically related to the the programming language.
To compare distributional consistency / discrepancy between two
samples, we usually use kolmogorov-smirnov test, which is implemented
in R with ks.test() or in SAS with pro npar1way edf.
I am wondering if there is any alternative to KS test that could be
generalized to K-samples.

Thanks and have a nice weekend.

wensui

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why I am not able to load library(R.matlab)? Other packages are fine.

2015-05-29 Thread C W

Hi Henrik,

I don't quite get what I should do here.  I am not familiar with
R.methodS3.  Can you tell me what command exactly do I need to do?

Thanks,

Mike

On Thu, May 28, 2015 at 3:30 PM, Henrik Bengtsson henrik.bengts...@ucsf.edu
 wrote:

 For some unknown reason, you've managed to install R.matlab without
 the dependency R.methodsS3 (cf.
 http://cran.r-project.org/web/packages/R.matlab/) or it happened due
 to some other glitch somewhere.

 Try to reinstall R.matlab.  If that doesn't help, explicitly install
 R.methodsS3 and retry.  If you get the same error with the other
 dependencies (R.oo and R.utils), do the same.

 /Henrik



 On Thu, May 28, 2015 at 11:47 AM, C W tmrs...@gmail.com wrote:
  Dear R list,
 
  I am trying to do use the R.matlab library, I did the following, but it
  does not work.
 
  library(R.matlab)
  Error in loadNamespace(j - i[[1L]], c(lib.loc, .libPaths()),
 versionCheck
  = vI[[j]]) :
there is no package called ‘R.methodsS3’
  Error: package or namespace load failed for ‘R.matlab’
 
  This is my session info.
 
  sessionInfo()
  R version 3.2.0 (2015-04-16)
  Platform: x86_64-apple-darwin13.4.0 (64-bit)
  Running under: OS X 10.10.3 (Yosemite)
 
  locale:
  [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
 
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base
 
  My R is up-to-date, R 3.2.0.  Why is this happening?  Is it because I
  installed the new R version, instead of updating it?  Maybe things are
 in a
  different directory?
 
  Thanks so much,
 
  Mike
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] alternatives to KS test applicable to K-samples

2015-05-29 Thread Cade, Brian

Wensui:  There are the multi-response permutation procedures (MRPP) that
readily test the omnibus hypothesis of no distributional differences among
multiple samples for univariate or multivariate responses.  There also are
empirical coverage tests that test a similar hypothesis among multiple
samples but only for univariate responses.  Both are included in the USGS
Blossom package for R linked here:
https://www.fort.usgs.gov/products/23735 (not
yet distributed via CRAN).  The MRPP may also be available in other R
packages on CRAN (vegan ?).

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov brian_c...@usgs.gov
tel:  970 226-9326


On Fri, May 29, 2015 at 10:31 AM, Wensui Liu liuwen...@gmail.com wrote:

 Good morning, All
 I have a stat question not specifically related to the the programming
 language.
 To compare distributional consistency / discrepancy between two
 samples, we usually use kolmogorov-smirnov test, which is implemented
 in R with ks.test() or in SAS with pro npar1way edf.
 I am wondering if there is any alternative to KS test that could be
 generalized to K-samples.

 Thanks and have a nice weekend.

 wensui

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting unique strings to unique numbers

2015-05-29 Thread Hervé Pagès


Hi Kate,

I found that matching the character vector to itself is a very
effective way to do this:

  x - c(a, bunch, of, strings, whose, exact, content,
 is, of, little, interest)
  ids - match(x, x)
  ids
  # [1]  1  2  3  4  5  6  7  8  3 10 11

By using this trick, many manipulations on character vectors can
be replaced by manipulations on integer vectors, which are sometimes
way more efficient.

Cheers,
H.


On 05/29/2015 09:58 AM, Kate Ignatius wrote:

I have a pedigree file as so:

X0001 BYX859  0  0  2  1 BYX859
X0001 BYX894  0  0  1  1 BYX894
X0001 BYX862 BYX894 BYX859  2  2 BYX862
X0001 BYX863 BYX894 BYX859  2  2 BYX863
X0001 BYX864 BYX894 BYX859  2  2 BYX864
X0001 BYX865 BYX894 BYX859  2  2 BYX865

And I was hoping to change all unique string values to numbers.

That is:

BYX859 = 1
BYX894 = 2
BYX862 = 3
BYX863 = 4
BYX864 = 5
BYX865 = 6

But only in columns 2 - 4.  Essentially I would like the data to look like this:

X0001 1 0 0  2  1 BYX859
X0001 2 0 0  1  1 BYX894
X0001 3 2 1  2  2 BYX862
X0001 4 2 1  2  2 BYX863
X0001 5 2 1  2  2 BYX864
X0001 6 2 1  2  2 BYX865

Is this possible with factors?

Thanks!

K.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting unique strings to unique numbers

2015-05-29 Thread Kate Ignatius

I found this helpful.  However - the second to forth columns come out
all zero - was this the intention?

That is:

X0001 0 0 0  2  1 BYX859
X0001 0 0 0  1  1 BYX894
X0001 0 0 0  2  2 BYX862
X0001 0 0 0  2  2 BYX863
X0001 0 0 0  2  2 BYX864
X0001 0 0 0  2  2 BYX865

On Fri, May 29, 2015 at 1:31 PM, William Dunlap wdun...@tibco.com wrote:
 match() will do what you want.  E.g., run your data through
 the following function.

 f - function (data)
 {
 uniqStrings - unique(c(data[, 2], data[, 3], data[, 4]))
 uniqStrings - setdiff(uniqStrings, 0)
 for (j in 2:4) {
 data[[j]] - match(data[[j]], uniqStrings, nomatch = 0L)
 }
 data
 }



 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Fri, May 29, 2015 at 9:58 AM, Kate Ignatius kate.ignat...@gmail.com
 wrote:

 I have a pedigree file as so:

 X0001 BYX859  0  0  2  1 BYX859
 X0001 BYX894  0  0  1  1 BYX894
 X0001 BYX862 BYX894 BYX859  2  2 BYX862
 X0001 BYX863 BYX894 BYX859  2  2 BYX863
 X0001 BYX864 BYX894 BYX859  2  2 BYX864
 X0001 BYX865 BYX894 BYX859  2  2 BYX865

 And I was hoping to change all unique string values to numbers.

 That is:

 BYX859 = 1
 BYX894 = 2
 BYX862 = 3
 BYX863 = 4
 BYX864 = 5
 BYX865 = 6

 But only in columns 2 - 4.  Essentially I would like the data to look like
 this:

 X0001 1 0 0  2  1 BYX859
 X0001 2 0 0  1  1 BYX894
 X0001 3 2 1  2  2 BYX862
 X0001 4 2 1  2  2 BYX863
 X0001 5 2 1  2  2 BYX864
 X0001 6 2 1  2  2 BYX865

 Is this possible with factors?

 Thanks!

 K.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] alternatives to KS test applicable to K-samples

2015-05-29 Thread Wensui Liu

Very nice, Brian

Sincerely appreciate your assistance!

On Friday, May 29, 2015, Cade, Brian ca...@usgs.gov wrote:

 Wensui:  There are the multi-response permutation procedures (MRPP) that
 readily test the omnibus hypothesis of no distributional differences among
 multiple samples for univariate or multivariate responses.  There also are
 empirical coverage tests that test a similar hypothesis among multiple
 samples but only for univariate responses.  Both are included in the USGS
 Blossom package for R linked here:
 https://www.fort.usgs.gov/products/23735 (not yet distributed via CRAN).
 The MRPP may also be available in other R packages on CRAN (vegan ?).

 Brian

 Brian S. Cade, PhD

 U. S. Geological Survey
 Fort Collins Science Center
 2150 Centre Ave., Bldg. C
 Fort Collins, CO  80526-8818

 email:  ca...@usgs.gov
 javascript:_e(%7B%7D,'cvml','brian_c...@usgs.gov');
 tel:  970 226-9326


 On Fri, May 29, 2015 at 10:31 AM, Wensui Liu liuwen...@gmail.com
 javascript:_e(%7B%7D,'cvml','liuwen...@gmail.com'); wrote:

 Good morning, All
 I have a stat question not specifically related to the the programming
 language.
 To compare distributional consistency / discrepancy between two
 samples, we usually use kolmogorov-smirnov test, which is implemented
 in R with ks.test() or in SAS with pro npar1way edf.
 I am wondering if there is any alternative to KS test that could be
 generalized to K-samples.

 Thanks and have a nice weekend.

 wensui

 __
 R-help@r-project.org
 javascript:_e(%7B%7D,'cvml','R-help@r-project.org'); mailing list --
 To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
==
WenSui Liu
Credit Risk Manager, 53 Bancorp
wensui@53.com
513-295-4370
==

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting unique strings to unique numbers

2015-05-29 Thread Jeff Newmiller

Of course, but I would not recommend it. A factor is a vector of integers with 
an attribute containing the labels that those integers correspond to. You seem 
to be asking for a factor that has lost the definitions part. But hey, 
newvector - as.integer(factor(oldvector)) should get you what you asked for 
one column at a time.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On May 29, 2015 9:58:22 AM PDT, Kate Ignatius kate.ignat...@gmail.com wrote:
I have a pedigree file as so:

X0001 BYX859  0  0  2  1 BYX859
X0001 BYX894  0  0  1  1 BYX894
X0001 BYX862 BYX894 BYX859  2  2 BYX862
X0001 BYX863 BYX894 BYX859  2  2 BYX863
X0001 BYX864 BYX894 BYX859  2  2 BYX864
X0001 BYX865 BYX894 BYX859  2  2 BYX865

And I was hoping to change all unique string values to numbers.

That is:

BYX859 = 1
BYX894 = 2
BYX862 = 3
BYX863 = 4
BYX864 = 5
BYX865 = 6

But only in columns 2 - 4.  Essentially I would like the data to look
like this:

X0001 1 0 0  2  1 BYX859
X0001 2 0 0  1  1 BYX894
X0001 3 2 1  2  2 BYX862
X0001 4 2 1  2  2 BYX863
X0001 5 2 1  2  2 BYX864
X0001 6 2 1  2  2 BYX865

Is this possible with factors?

Thanks!

K.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Mi script R es muy lento

2015-05-29 Thread Eric

Hola MªLuz, no se si es el mas rapido de todas las opciones que
existen, pero es muy muy rapido y el mas rapido que yo he usaado ... y
es bastante practico para realizar operaciones complejas con tablas,
aunque hay algunas cosas que no he sabido pasar de data.frames y
bucles a data.table, pero la verdad pienso que es mi falta y que lo
mas probable es que se pueda.

Saludos, eric.

On 5/29/15, MªLuz Morales mlzm...@gmail.com wrote:
 Hola, quiero compartir con vosotros mi problema y la solución que me han
 planteado. Mi programa carga Outcomes.csv y Set-A.csv  (descargados de
 http://garrickadenbuie.com/blog/2013/04/11/visualize-physionet-data-with-r/,
 apartado Getting Started -- the code and the data set) de unos 50MB entre
 los dos. Mi código era:


 #  Transforma csv a data frame
 seta - read.csv('Set-A.csv');
 outcomes - read.csv('Outcomes-A.csv');

 ids - as.character(unique(outcomes$RecordID));
 ## Número de RecordsID distintos
 Length_ids - length(ids); #número de RecordsID distintos
 ListaABP - list('RecordID'=-1,'SAPS.I'=-1, 'SOFA'=-1, 'Survival'=-1,
 'In.hospital_death'=-1, 'NISysABP_Min'=-1,'NISysABP_Max'=-1,
 'NISysABP_Mean'=-1, 'NIDiasABP_Min'=-1,'NIDiasABP_Max'=-1,
 'NIDiasABP_Mean'=-1,'NIMAP_Min'=-1,'NIMAP_Max'=-1, 'NIMAP_Mean'=-1);
 for (i in 1:Length_ids){#NumRecordID){   # Para cada paciente...

   ListaABP$RecordID[i] - outcomes$RecordID[i];
   ListaABP$SAPS.I[i] - outcomes$SAPS.I[i];
   ListaABP$SOFA[i] - outcomes$SOFA[i];
   ListaABP$Survival[i] - outcomes$Survival[i];
   ListaABP$In.hospital_death[i] - outcomes$In.hospital_death[i];

   # Parameter == 'NISysBP'
   #seta_NISysABP - seta[seta$RecordID == ids[i]  seta$Parameter ==
 'NISysABP' , c('RecordID','Value')] ;
   seta_NISysABP - seta[seta$RecordID == ids[i]  seta$Parameter ==
 'NISysABP' , 'Value'] ; #Creo que esto ya no sería un dataframe, por lo que
 en la siguiente línea puede dar error
   ListaABP$NISysABP_Min[i] - min(seta_NISysABP);
   ListaABP$NISysABP_Max[i] - max(seta_NISysABP);
   ListaABP$NISysABP_Mean[i] - mean(seta_NISysABP);

   # Parameter == 'NIDiasABP'
   #seta_NIDiasABP - seta[seta$RecordID == ids[i]  seta$Parameter ==
 'NIDiasABP' , c('Time','Value')] ; #En este caso la forma de hacer el min
 sería ...min(seta_NIDiasABP$Value);
   seta_NIDiasABP - seta[seta$RecordID == ids[i]  seta$Parameter ==
 'NIDiasABP' , 'Value'] ;
   ListaABP$NIDiasABP_Min[i] - min(seta_NIDiasABP);
   ListaABP$NIDiasABP_Max[i] - max(seta_NIDiasABP);
   ListaABP$NIDiasABP_Mean[i] - mean(seta_NIDiasABP);

   # Parameter == 'NIMAP'
   #seta_NIMAP - seta[seta$RecordID == ids[i]  seta$Parameter == 'NIMAP' ,
 c('Time','Value')] ;
   seta_NIMAP - seta[seta$RecordID == ids[i]  seta$Parameter == 'NIMAP' ,
 'Value'] ;
   ListaABP$NIMAP_Min[i] - min(seta_NIMAP);
   ListaABP$NIMAP_Max[i] - max(seta_NIMAP);
   ListaABP$NIMAP_Mean[i] - mean(seta_NIMAP);

 }#for i

 Tabla - data.frame(ListaABP);
 #+

 Este código tardaba 3 horas en ejecutarse. La solución que me han propuesto
 es usar data.table en lugar de data.frame y ahora tarda 1 segundo
 aproximadamente en ejecutarse y es este:

 #-
 library(data.table)
 datSet - fread(Set-A.csv)
 resOut - datSet[, .(ValMax=max(Value), ValMin=min(Value),
 ValAvg=mean(Value)), by=c(RecordID,Parameter)]
 resOut$RecordID - as.factor(resOut$RecordID)
 setkey(resOut, RecordID)
 head(datSet)
 datOutcome - fread(Outcomes-A.csv)
 datOutcome$RecordID - as.factor(datOutcome$RecordID)
 setkey(datOutcome, RecordID)
 head(datOutcome)
 #resEnd - merge(resOut, datOutcome, by=RecordID, all=TRUE,
 allow.cartesian=FALSE)
 resEnd - resOut[datOutcome]
 head(resEnd)
 setkey(resEnd, Parameter)
 #Ejemplo para conseguir uno o varios parametros.
 myRes - resEnd[c(NISysABP,NIDiasABP,NIMAP)]
 head(myRes)
 #--

 Tengo una pregunta, data.table es lo más eficiente para procesar grandes
 cantidades de datos?, es fácil de manejar si quieres realizar cálculos
 complejos además de reorganizar tablas...??

 Gracias
 Un saludo

   [[alternative HTML version deleted]]

 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es



-- 
Nota: las tildes se han omitido para evitar conflictos con algunos lectores
de correo.

Frases notables:
* SATYÂT NÂSTI PARO DHARMAH (No hay religion mas elevada que la verdad)
* La oscuridad no se combate, se ilumina ...
* Un economista es un experto que sabrá mañana por qué las cosas que predijo
ayer no han sucedido hoy (Laurence Peter).

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R] Why I am not able to load library(R.matlab)? Other packages are fine.

2015-05-29 Thread Ben Bolker

C W tmrsg11 at gmail.com writes:

 
 Hi Henrik,
 
 I don't quite get what I should do here.  I am not familiar with
 R.methodS3.  Can you tell me what command exactly do I need to do?
 
 Thanks,
 
 Mike

install.packages(R.methodsS3)
install.packages(R.matlab)
library(R.matlab)



  [snip snip snip]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] alternatives to KS test applicable to K-samples

2015-05-29 Thread David Winsemius


On May 29, 2015, at 9:31 AM, Wensui Liu wrote:

 Good morning, All
 I have a stat question not specifically related to the the programming 
 language.
 To compare distributional consistency / discrepancy between two
 samples, we usually use kolmogorov-smirnov test, which is implemented
 in R with ks.test() or in SAS with pro npar1way edf.
 I am wondering if there is any alternative to KS test that could be
 generalized to K-samples.

The 'coin' package (Hothorn, Hornick, van de Weil, and Zeileis) presents a 
variety of permutation and rank-based tests that would probably be more 
powerful than any multi-group variant of the KS test. The multi-group variant 
of the Wilcoxon Rank Sum Test presented in the examples for the help page: 
?wilcox_test is the Nemenyi-Damico-Wolfe-Dunn test.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Result differences in 32-bit vs. 64-bit point.in.polygon?

2015-05-29 Thread Lensing, Shelly Y

Is anyone aware of point.in.polygon giving different results for 32-bit vs. 
64-bit R? Our OS is 64-bit Windows 7 Enterprise. I'm working with someone 
else's extensive R program and the final results are close but not exactly 
matching. We're thinking it might be something with the point.in.polygon 
function (one of many possibilities, including leaps).

Thanks much,

Shelly Lensing
Biostatistics / University of Arkansas for Medical Sciences
4301 W. Markham St. #781 / Little Rock, AR  72205
V: 501.686.8203 / F: 501-526-6729 / COPH 3236

--
Confidentiality Notice: This e-mail message, including a...{{dropped:10}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about transforming a data.frame

2015-05-29 Thread Sarah Goslee

I'm still not really clear on what you need (format, etc), but this
may help you get started:

 with(df, table(CT, row_names))
   row_names
CT  A1:A2:A3 B10:B11:B12 B4:B5:B6 B7:B8:B9 D10:D11:D12 D4:D5:D6 E10:E11:E12
  20   001   21   1
  41   100   00   0
  50   010   00   0
 with(df, table(CT, col_names))
   col_names
CT  B1:B2:B3 D1:D2:D3 F10:F11:F12 G7:G8:G9 H1:H2:H3 H4:H5:H6
  210   1111
  411   0000
  510   0000


On Fri, May 29, 2015 at 4:58 PM, Bogdan Tanasa tan...@gmail.com wrote:
 Hi Sarah,

 thank you for your help. I have simplified the example, by reading the
 elements in a data frame, eg :

 df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
 D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12),
 col_names = c
 (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
 CT = c(5,2,2,2,2,2,4,4) )

 I have used the the count() in the plyr package :

 count_row_names - count(df$row_names)
 count_col_names - count(df$col_names)

 however, I would need to correlate these UNIQUE ELEMENTS in the columns
 row_names or col_names with the numbers they associate in the  CT
 columns, eg :

 B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12
 associate with 2 (in the CT column).

 thank you very much,

 bogdan




 On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

 Hi,

 Please use dput() to provide your data, as it can get somewhat mangled
 by copy and pasting, especially if you post in HTML (as you are asked
 not to do in the posting guide).

 What is a unique element? is B4:B5:B6 an element, or are B4 and
 B5 each elements? That is, what is the result you expect to obtain
 for the sample data you provided?

 What code have you tried? I would think table() might be involved, and
 possibly strsplit(), but will refrain from putting more time into this
 until you provide a reproducible dataset with dput() and some clearer
 idea of your intent.

 Sarah

 On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote:
  Dear all,
 
  I would appreciate a suggestion on the following : I am working with a
  data.frame (below) :
 
EXPCT   row_names   col_names
  1   test -5B4:B5:B6B1:B2:B3
  2   test -2B7:B8:B9B1:B2:B3
  3   test -2D4:D5:D6H4:H5:H6
  4   test -2D10:D11:D12 F10:F11:F12
  5   test -2D10:D11:D12H1:H2:H3
  6   test -2E10:E11:E12G7:G8:G9
  7   test -4 A1:A2:A3D1:D2:D3
  8   test -4   B10:B11:B12B1:B2:B3
 
  what would be the easiest way to consider UNIQUE elements in the
  ROW_NAMES
  or the UNIQUE elements in the COL_NAMES and :
 
  print how many times these UNIQUE ELEMENTS associate with the numbers
  -5,
  -2, or -4 (these numbers are on the column names CT) ..
 
  thanks,
 
  bogdan
 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with comparing multiple data sets

2015-05-29 Thread John Kane

Hi Mohammad,
I have no idea what is happening but for some reason your new data (renamed df1 
since df is a reserved word in R) is outputting a list whereas dff1 (your 
original test data) is giving a vector as you wanted.

It may be obvious but I don't see why df1 is giving us a list.  As far as I can 
tell the two data sets are structually the same.

The two data sets are below the program.  
## =
library(modeest)

# Original test data 
str(dff2)
head(dff2)

# sample of new data
str(d1)
head(df1)

Out.dff2  - apply(dff2[ ,2:length(dff2)], 1, mfv)
str(Out.dff2)

Out.df1  -  apply(df1[ , 2:length(df1)], 1, mfv)
str(Out.df1)


## =
## New data set 
df1  - structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L,
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label =
c(#authentication,access control,
#privacy,personal data, #security,malicious,security, data controller,
id management,security, password,recovery), class = factor),
class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L,
2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms,
class.1, class.2, class.3), class = data.frame, row.names = c(NA,
-50L))

## Original test data set

dff2  -   structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac,
 #mac,#security,
 accountability,anonymous, data security,encryption,security
 ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
 class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1,
 class.2, class.3), class = data.frame, row.names = c(NA,
 -49L))

##=



John Kane
Kingston ON Canada

-Original Message-
From: mxalimoha...@ualr.edu
Sent: Fri, 29 May 2015 11:40:41 -0500
To: dcarl...@tamu.edu, drjimle...@gmail.com, jrkrid...@inbox.com, 
r-help@r-project.org
Subject: Re: [R] Problem with comparing multiple data sets

Hi everyone.

I tried the (modeest) package on my initial test data and it worked. However, 
it doesn't work on the entire data set. I saved one of the protions that gives 
error. (Not for all of the values but for some of them). For example: lines 36 
and 37 and 39 correctly show the mode value but 38 and 40 are not correct. Such 
error is repeated for many of the values.

[36,] 2        

[37,] 2        

[38,] Numeric,3

[39,] 1        

[40,] Numeric,3



#This is what I did:

 df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,)

 Out- apply(df[,2:length(df)],1, mfv)

 t(t(Out))

#This is the data set 

structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 

5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 

6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 

6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access 
control, 

#privacy,personal data, #security,malicious,security, data controller, 

id management,security, password,recovery), class = factor), 

    class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 

    2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 

    1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 

    2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 

    0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 

    2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 

    2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 

    2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 

    2L, 2L, 0L, 0L, 0L, 0L, 1L,

Re: [R] about transforming a data.frame

2015-05-29 Thread Bogdan Tanasa

thanks a lot Sarah, very much appreciate it !

On Fri, May 29, 2015 at 3:18 PM, Sarah Goslee sarah.gos...@gmail.com
wrote:

 LMGTFY:
 http://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r

 On Fri, May 29, 2015 at 5:58 PM, Bogdan Tanasa tan...@gmail.com wrote:
  Dear Sarah,
 
  thank you very much, it is very helpful. please may I ask one more
 question
  about a quick and easy tutorial about the loading multiple files (from a
  folder) in R, and processing one file at a time ?  thanks very much
 again,
 
  bogdan
 
  On Fri, May 29, 2015 at 2:55 PM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  I'm still not really clear on what you need (format, etc), but this
  may help you get started:
 
   with(df, table(CT, row_names))
 row_names
  CT  A1:A2:A3 B10:B11:B12 B4:B5:B6 B7:B8:B9 D10:D11:D12 D4:D5:D6
  E10:E11:E12
20   001   21
  1
41   100   00
  0
50   010   00
  0
   with(df, table(CT, col_names))
 col_names
  CT  B1:B2:B3 D1:D2:D3 F10:F11:F12 G7:G8:G9 H1:H2:H3 H4:H5:H6
210   1111
411   0000
510   0000
  
 
  On Fri, May 29, 2015 at 4:58 PM, Bogdan Tanasa tan...@gmail.com
 wrote:
   Hi Sarah,
  
   thank you for your help. I have simplified the example, by reading the
   elements in a data frame, eg :
  
   df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
   D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3,
 B10:B11:B12),
   col_names = c
  
  
 (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
   CT = c(5,2,2,2,2,2,4,4) )
  
   I have used the the count() in the plyr package :
  
   count_row_names - count(df$row_names)
   count_col_names - count(df$col_names)
  
   however, I would need to correlate these UNIQUE ELEMENTS in the
 columns
   row_names or col_names with the numbers they associate in the  CT
   columns, eg :
  
   B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12
   associate with 2 (in the CT column).
  
   thank you very much,
  
   bogdan
  
  
  
  
   On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com
 
   wrote:
  
   Hi,
  
   Please use dput() to provide your data, as it can get somewhat
 mangled
   by copy and pasting, especially if you post in HTML (as you are asked
   not to do in the posting guide).
  
   What is a unique element? is B4:B5:B6 an element, or are B4 and
   B5 each elements? That is, what is the result you expect to obtain
   for the sample data you provided?
  
   What code have you tried? I would think table() might be involved,
 and
   possibly strsplit(), but will refrain from putting more time into
 this
   until you provide a reproducible dataset with dput() and some clearer
   idea of your intent.
  
   Sarah
  
   On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com
   wrote:
Dear all,
   
I would appreciate a suggestion on the following : I am working
 with
a
data.frame (below) :
   
  EXPCT   row_names   col_names
1   test -5B4:B5:B6B1:B2:B3
2   test -2B7:B8:B9B1:B2:B3
3   test -2D4:D5:D6H4:H5:H6
4   test -2D10:D11:D12 F10:F11:F12
5   test -2D10:D11:D12H1:H2:H3
6   test -2E10:E11:E12G7:G8:G9
7   test -4 A1:A2:A3D1:D2:D3
8   test -4   B10:B11:B12B1:B2:B3
   
what would be the easiest way to consider UNIQUE elements in the
ROW_NAMES
or the UNIQUE elements in the COL_NAMES and :
   
print how many times these UNIQUE ELEMENTS associate with the
 numbers
-5,
-2, or -4 (these numbers are on the column names CT) ..
   
thanks,
   
bogdan
   
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about transforming a data.frame

2015-05-29 Thread Jim Lemon

Hi Bogdan,
If you mean How can I verify that B1:B2:B3 is paired with all of
the values 2, 4 and 5

apply(table(df$col_names,df$CT),1,all)

and if you mean How can I verify that B1:B2:B3 is paired with at
least one of the values 2, 4 and 5

apply(table(df$col_names,df$CT),1,any)

Jim


Hi Jim,

yes, thank you, that is the desired output. one more question please :
after using the dataframe :

df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3,
B10:B11:B12),  col_names = c
(B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
CT = c(5,2,2,2,2,2,4,4) )

and :

table(df$row_names,df$CT)
table(df$col_names,df$CT)

how could I quickly verify that B1:B2:B3 (for example) hits the CT
values of 2,4,5  at least one time ? an example is in

table(df$col_names,df$CT) ?

thank you very much,

-- bogdan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about transforming a data.frame

2015-05-29 Thread Bogdan Tanasa

Dear Sarah,

thank you very much, it is very helpful. please may I ask one more question
about a quick and easy tutorial about the loading multiple files (from a
folder) in R, and processing one file at a time ?  thanks very much again,

bogdan

On Fri, May 29, 2015 at 2:55 PM, Sarah Goslee sarah.gos...@gmail.com
wrote:

 I'm still not really clear on what you need (format, etc), but this
 may help you get started:

  with(df, table(CT, row_names))
row_names
 CT  A1:A2:A3 B10:B11:B12 B4:B5:B6 B7:B8:B9 D10:D11:D12 D4:D5:D6 E10:E11:E12
   20   001   21   1
   41   100   00   0
   50   010   00   0
  with(df, table(CT, col_names))
col_names
 CT  B1:B2:B3 D1:D2:D3 F10:F11:F12 G7:G8:G9 H1:H2:H3 H4:H5:H6
   210   1111
   411   0000
   510   0000
 

 On Fri, May 29, 2015 at 4:58 PM, Bogdan Tanasa tan...@gmail.com wrote:
  Hi Sarah,
 
  thank you for your help. I have simplified the example, by reading the
  elements in a data frame, eg :
 
  df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
  D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12),
  col_names = c
 
 (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
  CT = c(5,2,2,2,2,2,4,4) )
 
  I have used the the count() in the plyr package :
 
  count_row_names - count(df$row_names)
  count_col_names - count(df$col_names)
 
  however, I would need to correlate these UNIQUE ELEMENTS in the columns
  row_names or col_names with the numbers they associate in the  CT
  columns, eg :
 
  B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12
  associate with 2 (in the CT column).
 
  thank you very much,
 
  bogdan
 
 
 
 
  On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  Hi,
 
  Please use dput() to provide your data, as it can get somewhat mangled
  by copy and pasting, especially if you post in HTML (as you are asked
  not to do in the posting guide).
 
  What is a unique element? is B4:B5:B6 an element, or are B4 and
  B5 each elements? That is, what is the result you expect to obtain
  for the sample data you provided?
 
  What code have you tried? I would think table() might be involved, and
  possibly strsplit(), but will refrain from putting more time into this
  until you provide a reproducible dataset with dput() and some clearer
  idea of your intent.
 
  Sarah
 
  On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com
 wrote:
   Dear all,
  
   I would appreciate a suggestion on the following : I am working with a
   data.frame (below) :
  
 EXPCT   row_names   col_names
   1   test -5B4:B5:B6B1:B2:B3
   2   test -2B7:B8:B9B1:B2:B3
   3   test -2D4:D5:D6H4:H5:H6
   4   test -2D10:D11:D12 F10:F11:F12
   5   test -2D10:D11:D12H1:H2:H3
   6   test -2E10:E11:E12G7:G8:G9
   7   test -4 A1:A2:A3D1:D2:D3
   8   test -4   B10:B11:B12B1:B2:B3
  
   what would be the easiest way to consider UNIQUE elements in the
   ROW_NAMES
   or the UNIQUE elements in the COL_NAMES and :
  
   print how many times these UNIQUE ELEMENTS associate with the numbers
   -5,
   -2, or -4 (these numbers are on the column names CT) ..
  
   thanks,
  
   bogdan
  


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why I am not able to load library(R.matlab)? Other packages are fine.

2015-05-29 Thread C W

Hi Ben,

Thanks for the fun clip.  I love it.  Have a wonderful day!

-M

On Fri, May 29, 2015 at 5:10 PM, Ben Bolker bbol...@gmail.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

  I think Henrik's point (which I merely clarified) was that something
 funky (we'll probably never know what, and it's not worth figuring out
 unless it happens again/to other people) had gone wrong and that the
 easiest thing to do was just to reinstall.

 References:
 * https://www.youtube.com/watch?v=t2F1rFmyQmY
 *

 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.208.9970rep=rep1type=pdf


 On 15-05-29 05:11 PM, C W wrote:
  Wow, thanks Ben.  That worked very well.
 
  I guess I didn't have R.methodS3?  But that doesn't make sense,
  because I was using R.matlab few weeks ago.  I believe I was on R
  3.1.
 
  Maybe it's in R 3.1 folder?  I am using a Mac, btw.
 
  Cheers,
 
  -M
 
  On Fri, May 29, 2015 at 1:55 PM, Ben Bolker bbol...@gmail.com
  wrote:
 
  C W tmrsg11 at gmail.com writes:
 
 
  Hi Henrik,
 
  I don't quite get what I should do here.  I am not familiar
  with R.methodS3.  Can you tell me what command exactly do I
  need to do?
 
  Thanks,
 
  Mike
 
  install.packages(R.methodsS3) install.packages(R.matlab)
  library(R.matlab)
 
 
 
  [snip snip snip]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
  see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
  the posting guide http://www.R-project.org/posting-guide.html and
  provide commented, minimal, self-contained, reproducible code.
 
 

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (GNU/Linux)

 iQEcBAEBAgAGBQJVaNXMAAoJEOCV5YRblxUHj6kH/3W3etyn+HlT0X1PEj7DQf2c
 Qo0q9ed2csPRLbLLrpX2FPKbxLg/g6MSxmIQ118tbWhkzKfRoyxCZHLcT+U2xLuR
 V7QAS3Yns2ENSSSH1GvdSeFZTQWW3XFZN/kT+/zQYjaZewZOlo4Cgqc16c6mGBRS
 eSIRIyA3iJWnMEc878nbMJztvsEqnpZSNSIXiI91UX/l8sDrBNYCNtfzY86JqJhp
 8O0q7zkaRIrb6UuViY3qTC5+qpGruUYIUbeqyNei7MNErrG3AufsODfs5d/CjSCa
 5jlbS512JRrQFV2JKHU+AH+4Q9CJQBVS+F6JZdjhHB2fUmAx0XIR6IJEBfSvBSk=
 =nO+b
 -END PGP SIGNATURE-


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Result differences in 32-bit vs. 64-bit point.in.polygon?

2015-05-29 Thread Duncan Murdoch

On 29/05/2015 2:36 PM, Lensing, Shelly Y wrote:
 Is anyone aware of point.in.polygon giving different results for 32-bit vs. 
 64-bit R? Our OS is 64-bit Windows 7 Enterprise. I'm working with someone 
 else's extensive R program and the final results are close but not exactly 
 matching. We're thinking it might be something with the point.in.polygon 
 function (one of many possibilities, including leaps).

Often 32 bit R does calculations slightly more accurately than 64 bit R
does.  This is because the 64 bit compiler is more likely to do
calculations in 64 bit precision when the 32 bit compiler does them in
80 bit precision.  Of course, individual calculations being more
accurate doesn't mean the final answer is, but small numeric differences
in floating point calculations are to be expected.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about transforming a data.frame

2015-05-29 Thread Bogdan Tanasa

Hi Jim,

thanks again. now I see : the answer to my previous question seems to be
yes, as all functions works on logical vectors ... best wishes,

-- bogdan

On Fri, May 29, 2015 at 4:29 PM, Bogdan Tanasa tan...@gmail.com wrote:

 Thanks a lot Jim. If I may ask one more little question please,

 shall I ask the question How can I verify that B1:B2:B3 is paired with
 ALL of the values 2, 4 and 5 ,

 regardless of the pairing value (in our case, for the code below, the
 pairing value for B1:B2:B3 is 1, but it can be 2,3,4, etc BUT NOT
 zero),

 how could I test for that ? or this is the way that apply works for
 all argument ?

 a good documentation for apply function will help too . thanks, and
 happy weekend !

 -- bogdan


 On Fri, May 29, 2015 at 4:21 PM, Jim Lemon drjimle...@gmail.com wrote:

 Hi Bogdan,
 If you mean How can I verify that B1:B2:B3 is paired with all of
 the values 2, 4 and 5

 apply(table(df$col_names,df$CT),1,all)

 and if you mean How can I verify that B1:B2:B3 is paired with at
 least one of the values 2, 4 and 5

 apply(table(df$col_names,df$CT),1,any)

 Jim


 Hi Jim,

 yes, thank you, that is the desired output. one more question please :
 after using the dataframe :

 df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
 D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3,
 B10:B11:B12),  col_names = c

 (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
 CT = c(5,2,2,2,2,2,4,4) )

 and :

 table(df$row_names,df$CT)
 table(df$col_names,df$CT)

 how could I quickly verify that B1:B2:B3 (for example) hits the CT
 values of 2,4,5  at least one time ? an example is in

 table(df$col_names,df$CT) ?

 thank you very much,

 -- bogdan




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Mi script R es muy lento

2015-05-29 Thread Carlos Ortega

Hola Mª Luz,

¿A qué tipo de cálculos complejos te refieres?.

Con data.table puedes definir operaciones (con la complejidad que quieras)
para un conjunto de filas, agrupándolas por columnas y más... Su sintaxis
es muy compacta pero a poco que la utilizas acabas encontrando la forma de
hacer las cosas sin muchos pasos intermedios. Pero puedes hacerlo menos
compacto y quizás más comprensible.

Y sobre la eficiencia de data.table comparándolo con otras alternativas
aquí viene una comparativa:
http://stackoverflow.com/questions/4322219/whats-the-fastest-way-to-merge-join-data-frames-in-r

Aunque desde la aparición de dplyr, la duda aparece sobre si es más
conveniente data.table o dplyr.
Aquí hay otro hilo que los compara, teniendo en cuenta diferentes atributos:

http://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly/27718317#27718317

¿Qué volúmen de datos quieres procesar?
Y...¿quieres algo más rápido que menos de un segundo?...

Saludos,
Carlos Ortega
www.qualityexcellence.es



El 29 de mayo de 2015, 15:50, MªLuz Morales mlzm...@gmail.com escribió:

 Hola, quiero compartir con vosotros mi problema y la solución que me han
 planteado. Mi programa carga Outcomes.csv y Set-A.csv  (descargados de
 http://garrickadenbuie.com/blog/2013/04/11/visualize-physionet-data-with-r/
 ,
 apartado Getting Started -- the code and the data set) de unos 50MB entre
 los dos. Mi código era:


 #  Transforma csv a data frame
 seta - read.csv('Set-A.csv');
 outcomes - read.csv('Outcomes-A.csv');

 ids - as.character(unique(outcomes$RecordID));
 ## Número de RecordsID distintos
 Length_ids - length(ids); #número de RecordsID distintos
 ListaABP - list('RecordID'=-1,'SAPS.I'=-1, 'SOFA'=-1, 'Survival'=-1,
 'In.hospital_death'=-1, 'NISysABP_Min'=-1,'NISysABP_Max'=-1,
 'NISysABP_Mean'=-1, 'NIDiasABP_Min'=-1,'NIDiasABP_Max'=-1,
 'NIDiasABP_Mean'=-1,'NIMAP_Min'=-1,'NIMAP_Max'=-1, 'NIMAP_Mean'=-1);
 for (i in 1:Length_ids){#NumRecordID){   # Para cada paciente...

   ListaABP$RecordID[i] - outcomes$RecordID[i];
   ListaABP$SAPS.I[i] - outcomes$SAPS.I[i];
   ListaABP$SOFA[i] - outcomes$SOFA[i];
   ListaABP$Survival[i] - outcomes$Survival[i];
   ListaABP$In.hospital_death[i] - outcomes$In.hospital_death[i];

   # Parameter == 'NISysBP'
   #seta_NISysABP - seta[seta$RecordID == ids[i]  seta$Parameter ==
 'NISysABP' , c('RecordID','Value')] ;
   seta_NISysABP - seta[seta$RecordID == ids[i]  seta$Parameter ==
 'NISysABP' , 'Value'] ; #Creo que esto ya no sería un dataframe, por lo que
 en la siguiente línea puede dar error
   ListaABP$NISysABP_Min[i] - min(seta_NISysABP);
   ListaABP$NISysABP_Max[i] - max(seta_NISysABP);
   ListaABP$NISysABP_Mean[i] - mean(seta_NISysABP);

   # Parameter == 'NIDiasABP'
   #seta_NIDiasABP - seta[seta$RecordID == ids[i]  seta$Parameter ==
 'NIDiasABP' , c('Time','Value')] ; #En este caso la forma de hacer el min
 sería ...min(seta_NIDiasABP$Value);
   seta_NIDiasABP - seta[seta$RecordID == ids[i]  seta$Parameter ==
 'NIDiasABP' , 'Value'] ;
   ListaABP$NIDiasABP_Min[i] - min(seta_NIDiasABP);
   ListaABP$NIDiasABP_Max[i] - max(seta_NIDiasABP);
   ListaABP$NIDiasABP_Mean[i] - mean(seta_NIDiasABP);

   # Parameter == 'NIMAP'
   #seta_NIMAP - seta[seta$RecordID == ids[i]  seta$Parameter == 'NIMAP' ,
 c('Time','Value')] ;
   seta_NIMAP - seta[seta$RecordID == ids[i]  seta$Parameter == 'NIMAP' ,
 'Value'] ;
   ListaABP$NIMAP_Min[i] - min(seta_NIMAP);
   ListaABP$NIMAP_Max[i] - max(seta_NIMAP);
   ListaABP$NIMAP_Mean[i] - mean(seta_NIMAP);

 }#for i

 Tabla - data.frame(ListaABP);
 #+

 Este código tardaba 3 horas en ejecutarse. La solución que me han propuesto
 es usar data.table en lugar de data.frame y ahora tarda 1 segundo
 aproximadamente en ejecutarse y es este:

 #-
 library(data.table)
 datSet - fread(Set-A.csv)
 resOut - datSet[, .(ValMax=max(Value), ValMin=min(Value),
 ValAvg=mean(Value)), by=c(RecordID,Parameter)]
 resOut$RecordID - as.factor(resOut$RecordID)
 setkey(resOut, RecordID)
 head(datSet)
 datOutcome - fread(Outcomes-A.csv)
 datOutcome$RecordID - as.factor(datOutcome$RecordID)
 setkey(datOutcome, RecordID)
 head(datOutcome)
 #resEnd - merge(resOut, datOutcome, by=RecordID, all=TRUE,
 allow.cartesian=FALSE)
 resEnd - resOut[datOutcome]
 head(resEnd)
 setkey(resEnd, Parameter)
 #Ejemplo para conseguir uno o varios parametros.
 myRes - resEnd[c(NISysABP,NIDiasABP,NIMAP)]
 head(myRes)
 #--

 Tengo una pregunta, data.table es lo más eficiente para procesar grandes
 cantidades de datos?, es fácil de manejar si quieres realizar cálculos
 complejos además de reorganizar tablas...??

 Gracias
 Un saludo

 [[alternative HTML version deleted]]

 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es




-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

Re: [R] about transforming a data.frame

2015-05-29 Thread Bogdan Tanasa

Hi Jim,

yes, thank you, that is the desired output. one more question please :
after using the dataframe :

df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12),
col_names = c
(B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
CT = c(5,2,2,2,2,2,4,4) )

and :

table(df$row_names,df$CT)
table(df$col_names,df$CT)

how could I quickly verify that B1:B2:B3 (for example) hits the CT values
of 2,4,5  at least one time ? an example is in

table(df$col_names,df$CT) ?

thank you very much,

-- bogdan



On Fri, May 29, 2015 at 2:40 PM, Jim Lemon drjimle...@gmail.com wrote:

 Hi Bogdan,
 Sarah has already suggested this, but doesn't:

 table(df$row_names,df$CT)
 table(df$col_names,df$CT)

 give you what you want?

 Jim


 On Sat, May 30, 2015 at 7:11 AM, John Kane jrkrid...@inbox.com wrote:
  Bogdan, the request was for data in dput() format.
 
  Type ?dput for more information.
 
  Do dput(myfile) copy the ouput and paste into the email
 
  You should get something like:
  structure(list(c1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
  2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
  5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L,
  8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c((0.509,0.614],
  (0.614,0.718], (0.718,0.822], (0.822,0.926], (0.926,1.03],
  (1.03,1.13], (1.13,1.24], (1.24,1.34], (1.34,1.45], (1.45,1.55]
  ), class = factor), s1 = c(0.51, 0.52, 0.58, 0.58, 0.59, 0.6,
  0.63, 0.65, 0.68, 0.74, 0.74, 0.75, 0.77, 0.77, 0.77, 0.78, 0.79,
  0.84, 0.84, 0.85, 0.87, 0.93, 0.93, 0.95, 0.99, 1.04, 1.09, 1.11,
  1.13, 1.14, 1.14, 1.14, 1.17, 1.18, 1.19, 1.22, 1.22, 1.23, 1.28,
  1.29, 1.3, 1.32, 1.37, 1.38, 1.38, 1.4, 1.43, 1.47, 1.52, 1.55
  )), .Names = c(c1, s1), row.names = c(NA, -50L), class =
 data.frame)
 
  Data in duput() format is the preferred way to get data in R-help since
 it provides a perfect copy of what you have on your machine.  Any other way
 of providing data risks the recipients reading it into R differently than
 it is on your machine.
 
  John Kane
  Kingston ON Canada
 
 
  -Original Message-
  From: tan...@gmail.com
  Sent: Fri, 29 May 2015 13:58:20 -0700
  To: sarah.gos...@gmail.com
  Subject: Re: [R] about transforming a data.frame
 
  Hi Sarah,
 
  thank you for your help. I have simplified the example, by reading the
  elements in a data frame, eg :
 
  df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
  D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12),
  col_names = c
 
 (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
  CT = c(5,2,2,2,2,2,4,4) )
 
  I have used the the count() in the plyr package :
 
  count_row_names - count(df$row_names)
  count_col_names - count(df$col_names)
 
  however, I would need to correlate these UNIQUE ELEMENTS in the columns
  row_names or col_names with the numbers they associate in the  CT
  columns, eg :
 
  B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12
  associate with 2 (in the CT column).
 
  thank you very much,
 
  bogdan
 
 
 
 
  On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  Hi,
 
  Please use dput() to provide your data, as it can get somewhat mangled
  by copy and pasting, especially if you post in HTML (as you are asked
  not to do in the posting guide).
 
  What is a unique element? is B4:B5:B6 an element, or are B4 and
  B5 each elements? That is, what is the result you expect to obtain
  for the sample data you provided?
 
  What code have you tried? I would think table() might be involved, and
  possibly strsplit(), but will refrain from putting more time into this
  until you provide a reproducible dataset with dput() and some clearer
  idea of your intent.
 
  Sarah
 
  On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com
 wrote:
  Dear all,
 
  I would appreciate a suggestion on the following : I am working with a
  data.frame (below) :
 
EXPCT   row_names   col_names
  1   test -5B4:B5:B6B1:B2:B3
  2   test -2B7:B8:B9B1:B2:B3
  3   test -2D4:D5:D6H4:H5:H6
  4   test -2D10:D11:D12 F10:F11:F12
  5   test -2D10:D11:D12H1:H2:H3
  6   test -2E10:E11:E12G7:G8:G9
  7   test -4 A1:A2:A3D1:D2:D3
  8   test -4   B10:B11:B12B1:B2:B3
 
  what would be the easiest way to consider UNIQUE elements in the
  ROW_NAMES
  or the UNIQUE elements in the COL_NAMES and :
 
  print how many times these UNIQUE ELEMENTS associate with the numbers
  -5,
  -2, or -4 (these numbers are on the column names CT) ..
 
  thanks,
 
  bogdan
 
  --
  Sarah Goslee
  http://www.functionaldiversity.org
 
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting

Re: [R] about transforming a data.frame

2015-05-29 Thread Sarah Goslee

LMGTFY: 
http://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r

On Fri, May 29, 2015 at 5:58 PM, Bogdan Tanasa tan...@gmail.com wrote:
 Dear Sarah,

 thank you very much, it is very helpful. please may I ask one more question
 about a quick and easy tutorial about the loading multiple files (from a
 folder) in R, and processing one file at a time ?  thanks very much again,

 bogdan

 On Fri, May 29, 2015 at 2:55 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

 I'm still not really clear on what you need (format, etc), but this
 may help you get started:

  with(df, table(CT, row_names))
row_names
 CT  A1:A2:A3 B10:B11:B12 B4:B5:B6 B7:B8:B9 D10:D11:D12 D4:D5:D6
 E10:E11:E12
   20   001   21
 1
   41   100   00
 0
   50   010   00
 0
  with(df, table(CT, col_names))
col_names
 CT  B1:B2:B3 D1:D2:D3 F10:F11:F12 G7:G8:G9 H1:H2:H3 H4:H5:H6
   210   1111
   411   0000
   510   0000
 

 On Fri, May 29, 2015 at 4:58 PM, Bogdan Tanasa tan...@gmail.com wrote:
  Hi Sarah,
 
  thank you for your help. I have simplified the example, by reading the
  elements in a data frame, eg :
 
  df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
  D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12),
  col_names = c
 
  (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
  CT = c(5,2,2,2,2,2,4,4) )
 
  I have used the the count() in the plyr package :
 
  count_row_names - count(df$row_names)
  count_col_names - count(df$col_names)
 
  however, I would need to correlate these UNIQUE ELEMENTS in the columns
  row_names or col_names with the numbers they associate in the  CT
  columns, eg :
 
  B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12
  associate with 2 (in the CT column).
 
  thank you very much,
 
  bogdan
 
 
 
 
  On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  Hi,
 
  Please use dput() to provide your data, as it can get somewhat mangled
  by copy and pasting, especially if you post in HTML (as you are asked
  not to do in the posting guide).
 
  What is a unique element? is B4:B5:B6 an element, or are B4 and
  B5 each elements? That is, what is the result you expect to obtain
  for the sample data you provided?
 
  What code have you tried? I would think table() might be involved, and
  possibly strsplit(), but will refrain from putting more time into this
  until you provide a reproducible dataset with dput() and some clearer
  idea of your intent.
 
  Sarah
 
  On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com
  wrote:
   Dear all,
  
   I would appreciate a suggestion on the following : I am working with
   a
   data.frame (below) :
  
 EXPCT   row_names   col_names
   1   test -5B4:B5:B6B1:B2:B3
   2   test -2B7:B8:B9B1:B2:B3
   3   test -2D4:D5:D6H4:H5:H6
   4   test -2D10:D11:D12 F10:F11:F12
   5   test -2D10:D11:D12H1:H2:H3
   6   test -2E10:E11:E12G7:G8:G9
   7   test -4 A1:A2:A3D1:D2:D3
   8   test -4   B10:B11:B12B1:B2:B3
  
   what would be the easiest way to consider UNIQUE elements in the
   ROW_NAMES
   or the UNIQUE elements in the COL_NAMES and :
  
   print how many times these UNIQUE ELEMENTS associate with the numbers
   -5,
   -2, or -4 (these numbers are on the column names CT) ..
  
   thanks,
  
   bogdan
  



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] vectorized code

2015-05-29 Thread zeynab jibril

HI

I was working on online example, where virus is spread through a graph. The
example is sufficient for small graph i.e. small number of edges and nodes.
But I tried it on very large graph i.e. 1 nodes and 2 edges, but
the below function is not sufficient for large graph because its slow.

My question is how can the below function be converted to Vectorized code
can be optimized for large graphs?

spreadVirus - function(G,Vinitial,Activation_probability){



# Precompute all outgoing graph adjacencies



G$AdjList = get.adjlist(G,mode=out)



# Initialize various graph attributes

V(G)$color= blue

E(G)$color= black

V(G)[Vinitial]$color- yellow



# List to store the incremental graphs (for plotting later)

Glist - list(G)

count - 1



# Spread the infection

active - Vinitial



while(length(active)0){

new_infected - NULL

E(G)$color = black



for(v in active){

# spread through the daily contacts of vertex v



daily_contacts - G$AdjList[[v]]



E(G)[v %-% daily_contacts]$color - red



for(v1 in daily_contacts){



if(V(G)[v1]$color == blue  new_color==red) {



V(G)[v1]$color - red



new_infected - c(new_infected,v1)



 }

}

}

# the next active set

#this needed for updating



active - new_infected



# Add graph to list

# optional dependening on if i want to graph

count - count + 1

Glist[[count]] - G

}

return(Glist)

}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on R Functionality Histogram

2015-05-29 Thread Boris Steipe

Don't use Nabble when posting to the R-Help forum.

Responses inline.

On May 29, 2015, at 7:54 AM, Shivi82 shivibha...@ymail.com wrote:

 Hello Experts, 
 I have couple of questions on the analysis I am creating.
 1) How does R adopt to changes. The case I have here is that the excel I
 have started initially had to be modified because the data I had was on
 hourly basis ranging from 0 to 23 hours. After Changes 0 was modified to 24
 in hours. Now do I need to recall this excel again in R using read.csv
 syntax or is there another way to do so i.e. a kind of reload option

No. Reload the data by rerunning your script.


 2) I am creating a histogram. I need on x axis 24 hours to be displayed
 separately as 0,1,2, and thereon. However it only shows till 20 which makes
 the look awkward. Also all l need to resize the labels and if possible
 inside the bars. It used the below code, axis fonts have changed but labels
 give an error with this code
 
 Code:- hist(aaa$Hours,main=Hourly Weight,xlab = Time,breaks = 25,col =
 yellow,ylim = c(0,9000),
 labels=TRUE, cex.axis=0.6,cex.label=0.6)

The very understandable warning message you must have got with that call tells 
you that there is no such argument cex.label.

hist() calls plot.histogram() which internally calls text() to write the 
labels. text() has an argument cex, but even if you supply it to hist(), it 
is not passed to text() via the function body of plot.histogram(). You could 
modify plot.histogram but the more immediate solution is to set labels = FALSE, 
and explicitly use text() to write your labels. Try something like

x - hist(aaa$Hours,
 main=Hourly Weight,
 xlab = Time,
 breaks = 25,
 col = yellow,
 ylim = c(0,9000),
 labels=FALSE,
 cex.axis=0.6)
 
text(x$mids, x$counts * 1.05, labels = x$counts, cex=0.5)



B.


 
 Kindly advice on the both the questions. Thanks. 
 
 
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Help-on-R-Functionality-Histogram-tp4707887.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about transforming a data.frame

2015-05-29 Thread Jim Lemon

Hi Bogdan,
Sarah has already suggested this, but doesn't:

table(df$row_names,df$CT)
table(df$col_names,df$CT)

give you what you want?

Jim


On Sat, May 30, 2015 at 7:11 AM, John Kane jrkrid...@inbox.com wrote:
 Bogdan, the request was for data in dput() format.

 Type ?dput for more information.

 Do dput(myfile) copy the ouput and paste into the email

 You should get something like:
 structure(list(c1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L,
 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c((0.509,0.614],
 (0.614,0.718], (0.718,0.822], (0.822,0.926], (0.926,1.03],
 (1.03,1.13], (1.13,1.24], (1.24,1.34], (1.34,1.45], (1.45,1.55]
 ), class = factor), s1 = c(0.51, 0.52, 0.58, 0.58, 0.59, 0.6,
 0.63, 0.65, 0.68, 0.74, 0.74, 0.75, 0.77, 0.77, 0.77, 0.78, 0.79,
 0.84, 0.84, 0.85, 0.87, 0.93, 0.93, 0.95, 0.99, 1.04, 1.09, 1.11,
 1.13, 1.14, 1.14, 1.14, 1.17, 1.18, 1.19, 1.22, 1.22, 1.23, 1.28,
 1.29, 1.3, 1.32, 1.37, 1.38, 1.38, 1.4, 1.43, 1.47, 1.52, 1.55
 )), .Names = c(c1, s1), row.names = c(NA, -50L), class = data.frame)

 Data in duput() format is the preferred way to get data in R-help since it 
 provides a perfect copy of what you have on your machine.  Any other way of 
 providing data risks the recipients reading it into R differently than it is 
 on your machine.

 John Kane
 Kingston ON Canada


 -Original Message-
 From: tan...@gmail.com
 Sent: Fri, 29 May 2015 13:58:20 -0700
 To: sarah.gos...@gmail.com
 Subject: Re: [R] about transforming a data.frame

 Hi Sarah,

 thank you for your help. I have simplified the example, by reading the
 elements in a data frame, eg :

 df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
 D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12),
 col_names = c
 (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
 CT = c(5,2,2,2,2,2,4,4) )

 I have used the the count() in the plyr package :

 count_row_names - count(df$row_names)
 count_col_names - count(df$col_names)

 however, I would need to correlate these UNIQUE ELEMENTS in the columns
 row_names or col_names with the numbers they associate in the  CT
 columns, eg :

 B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12
 associate with 2 (in the CT column).

 thank you very much,

 bogdan




 On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

 Hi,

 Please use dput() to provide your data, as it can get somewhat mangled
 by copy and pasting, especially if you post in HTML (as you are asked
 not to do in the posting guide).

 What is a unique element? is B4:B5:B6 an element, or are B4 and
 B5 each elements? That is, what is the result you expect to obtain
 for the sample data you provided?

 What code have you tried? I would think table() might be involved, and
 possibly strsplit(), but will refrain from putting more time into this
 until you provide a reproducible dataset with dput() and some clearer
 idea of your intent.

 Sarah

 On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote:
 Dear all,

 I would appreciate a suggestion on the following : I am working with a
 data.frame (below) :

   EXPCT   row_names   col_names
 1   test -5B4:B5:B6B1:B2:B3
 2   test -2B7:B8:B9B1:B2:B3
 3   test -2D4:D5:D6H4:H5:H6
 4   test -2D10:D11:D12 F10:F11:F12
 5   test -2D10:D11:D12H1:H2:H3
 6   test -2E10:E11:E12G7:G8:G9
 7   test -4 A1:A2:A3D1:D2:D3
 8   test -4   B10:B11:B12B1:B2:B3

 what would be the easiest way to consider UNIQUE elements in the
 ROW_NAMES
 or the UNIQUE elements in the COL_NAMES and :

 print how many times these UNIQUE ELEMENTS associate with the numbers
 -5,
 -2, or -4 (these numbers are on the column names CT) ..

 thanks,

 bogdan

 --
 Sarah Goslee
 http://www.functionaldiversity.org


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
 family!
 Visit http://www.inbox.com/photosharing to find out more!

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE

Re: [R] TWS and R

2015-05-29 Thread Austin Trombley

Has anyone found a solution to this?  I am having the same issue?  thanks! 

On Thursday, November 15, 2012 at 10:35:48 PM UTC-8, abcd1234 wrote:

 Hi all, 

  The TWS on my system is unable to connect to my R session. Here is the 
 error that I'm getting: 

 / tws-twsConnect() 
 Error in socketConnection(host = host, port = port, open = ab, blocking 
 = 
 blocking) : 
 cannot open the connection 
 In addition: Warning message: 
 In socketConnection(host = host, port = port, open = ab, blocking = 
 blocking) : 
 localhost:7496 cannot be opened/ 

 Here is the session info for the R session: 
 / 
 R version 2.15.1 (2012-06-22) 
  Platform: x86_64-pc-linux-gnu (64-bit) 
 locale: 
  [1] LC_CTYPE=en_IN.UTF-8LC_NUMERIC=C 

  [3] LC_TIME=en_IN.UTF-8LC_COLLATE=en_IN.UTF-8 
   
 [5] LC_MONETARY=en_IN.UTF-8LC_MESSAGES=en_IN.UTF-8 
   
 [7] LC_PAPER=CLC_NAME=C 

  [9] LC_ADDRESS=CLC_TELEPHONE=C 

  [11] LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C 

  attached base packages: 
 [1] statsgraphicsgrDevices utilsdatasets 
 [6] methodsbase 
   
 other attached packages: 
  [1] IBrokers_0.9-10 xts_0.8-6zoo_1.7-8 

  loaded via a namespace (and not attached): 
  [1] grid_2.15.1lattice_0.20-0 tools_2.15.1/ 

 I have checked the Enable Activex and Socket clients  but it hasn't 
 helped. Since I'm running on an Ubuntu machine, I even tried changing the 
 parameter blocking in the command twsConnect() to 

 1. blocking = FALSE 

 2. According to the one mentioned here 

 http://code.google.com/p/ibrokers/source/detail?r=84path=/trunk/R/twsConnect.R
  
   
 but nothing has helped. 

 I have also added 127.0.0.1 to the Trusted IP option. 

  Please let me know what I should do. 

 Thanks. 




 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/TWS-and-R-tp4649699.html 
 Sent from the R help mailing list archive at Nabble.com. 

 __ 
 r-h...@r-project.org javascript: mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code. 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting unique strings to unique numbers

2015-05-29 Thread Hervé Pagès


Hi Bill,

On 05/29/2015 01:48 PM, William Dunlap wrote:

I'm not sure why which particular ID gets assigned to each string would
matter but maybe I'm missing something. What really matters is that each
string receives a unique ID. match(x, x) does that.


I think each row of the OP's dataset represented an individual (column 2)
followed by its mother and father (columns 3 and 4).  I assume that the
marker 0 means that a parent is not in the dataset.  If you match against
the strings in column 2 only, in their original order, then the
resulting numbers
give the row number of an individual,


Note that the code I gave happens to do exactly that (assuming that
column 2 contains no duplicates, but your code is also relying on that
assumption in order to have the ids match the row numbers).

We're discussing the merit of match(x, x) versus match(x, unique(x)).
All I'm trying to say is that the unique(x) step (which doubles the cost
of the whole operation, because it also uses hashing, like match() does)
is generally not needed. It doesn't seem to be needed in Kate's use
case.

H.


making it straightforward to look up
information regarding the ancestors of an individual.  Hence the choice of
numeric ID's may be important.

Bill Dunlap
TIBCO Software
wdunlap tibco.com http://tibco.com

On Fri, May 29, 2015 at 1:29 PM, Hervé Pagès hpa...@fredhutch.org
mailto:hpa...@fredhutch.org wrote:

Hi Sarah,

On 05/29/2015 12:04 PM, Sarah Goslee wrote:

On Fri, May 29, 2015 at 2:16 PM, Hervé Pagès
hpa...@fredhutch.org mailto:hpa...@fredhutch.org wrote:

Hi Kate,

I found that matching the character vector to itself is a very
effective way to do this:

x - c(a, bunch, of, strings, whose, exact,
content,
   is, of, little, interest)
ids - match(x, x)
ids
# [1]  1  2  3  4  5  6  7  8  3 10 11

By using this trick, many manipulations on character vectors can
be replaced by manipulations on integer vectors, which are
sometimes
way more efficient.


Hm. I hadn't thought of that approach - I use the
as.numeric(factor(...)) approach.

So I was curious, and compared the two:


set.seed(43)
x - sample(letters, 1, replace=TRUE)

system.time({
for(i in seq_len(2)) {
ids1 - match(x, x)
}})

#   user  system elapsed
#  9.657   0.000   9.657

system.time({
for(i in seq_len(2)) {
ids2 - as.numeric(factor(x, levels=letters))
}})

#   user  system elapsed
#   6.160.006.16

Using factor() is faster.


That's an unfair comparison, because you already know what the levels
are so you can supply them to your call to factor(). Most of the time
you don't know what the levels are so either you just do factor(x) and
let the factor() constructor compute the levels for you, or you compute
them yourself upfront with something like factor(x, levels=unique(x)).

   library(microbenchmark)

   microbenchmark(
 {ids1 - match(x, x)},
 {ids2 - as.integer(factor(x, levels=letters))},
 {ids3 - as.integer(factor(x))},
 {ids4 - as.integer(factor(x, levels=unique(x)))}
   )
   Unit: microseconds
   expr min
  lq
{ ids1 - match(x, x) } 245.979
262.2390
{ ids2 - as.integer(factor(x, levels = letters)) } 214.115
219.2320
  { ids3 - as.integer(factor(x)) } 380.782
388.7295
  { ids4 - as.integer(factor(x, levels = unique(x))) } 332.250
342.6630
mean   median  uq max neval
267.3210 264.4845 268.348 293.894   100
226.9913 220.9870 226.147 314.875   100
402.2242 394.7165 412.075 481.410   100
349.7405 345.3090 353.162 383.002   100

More importantly, using factor() lets you
set the order of the indices in an expected fashion, where match()
assigns them in the order of occurrence.

head(data.frame(x, ids1, ids2))

x ids1 ids2
1 m1   13
2 x2   24
3 b32
4 s4   19
5 i59
6 o6   15

In a problem like Kate's where there are several columns for
which the
same ordering of indices is desired, that becomes really important.


I'm not sure why which particular ID gets assigned to each string would
matter but maybe I'm missing something. What really matters is that each
string receives a unique ID. match(x, x) does that.

In Kate's problem, where the strings are in more than one column,
and you want the ID to be unique across the columns, you need to do

Re: [R] about transforming a data.frame

2015-05-29 Thread Bogdan Tanasa

Thanks a lot Jim. If I may ask one more little question please,

shall I ask the question How can I verify that B1:B2:B3 is paired with
ALL of the values 2, 4 and 5 ,

regardless of the pairing value (in our case, for the code below, the
pairing value for B1:B2:B3 is 1, but it can be 2,3,4, etc BUT NOT
zero),

how could I test for that ? or this is the way that apply works for all
argument ?

a good documentation for apply function will help too . thanks, and happy
weekend !

-- bogdan


On Fri, May 29, 2015 at 4:21 PM, Jim Lemon drjimle...@gmail.com wrote:

 Hi Bogdan,
 If you mean How can I verify that B1:B2:B3 is paired with all of
 the values 2, 4 and 5

 apply(table(df$col_names,df$CT),1,all)

 and if you mean How can I verify that B1:B2:B3 is paired with at
 least one of the values 2, 4 and 5

 apply(table(df$col_names,df$CT),1,any)

 Jim


 Hi Jim,

 yes, thank you, that is the desired output. one more question please :
 after using the dataframe :

 df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
 D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3,
 B10:B11:B12),  col_names = c

 (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
 CT = c(5,2,2,2,2,2,4,4) )

 and :

 table(df$row_names,df$CT)
 table(df$col_names,df$CT)

 how could I quickly verify that B1:B2:B3 (for example) hits the CT
 values of 2,4,5  at least one time ? an example is in

 table(df$col_names,df$CT) ?

 thank you very much,

 -- bogdan


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with comparing multiple data sets

2015-05-29 Thread Jim Lemon

Hi Mohammad,
It looks like you are still having problems with this. Given your
latest data set, as below, here is something that might do what you
want. From David's message, I'm not sure whether you are operating on
a single data frame or a list.

# this is the data set as taken from your message below
madf-structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L,
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label =
c(#authentication,access control,
#privacy,personal data, #security,malicious,security, data controller,
id management,security, password,recovery), class = factor),
class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L,
2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms,
class.1, class.2, class.3), class = data.frame, row.names = c(NA,
-50L))

# define a function that extracts the value from one field
# selected by a value in another field
extract_by_value-function(x,field1,value1,field2) {
 return(x[x[,field1]==value1,field2])
}

# define another function that equates all of the values
sub_value-function(x,field1,value1,field2,value2) {
 x[x[,field1]==value1,field2]-value2
 return(x)
}

# this now steps through every value in key_field
# and operates on every field listed in change_fields
conformity-function(x,key_field,change_fields) {
 keys-unique(x[,key_field])
 for(key in keys) {
  for(change_field in change_fields) {
   # get the most frequent value in change_field
   # for the desired value in key_field
   most_freq-as.numeric(names(which.max(table(
extract_by_value(x,key_field,key,change_field)
   # now set all the values to the most frequent
   x-sub_value(x,key_field,key,change_field,most_freq)
  }
 }
 return(x)
}

conformity(madf,terms,c(class.1,class.2,class.3))

Obviously you will want to save the return value of conformity into
your original data frame or create a new one.

Jim

 Hi everyone.

 I tried the (modeest) package on my initial test data and it worked. However, 
 it doesn't work on the entire data set. I saved one of the protions that 
 gives error. (Not for all of the values but for some of them). For example: 
 lines 36 and 37 and 39 correctly show the mode value but 38 and 40 are not 
 correct. Such error is repeated for many of the values.

 [36,] 2

 [37,] 2

 [38,] Numeric,3

 [39,] 1

 [40,] Numeric,3

 

 #This is what I did:

 df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,)

 Out- apply(df[,2:length(df)],1, mfv)

 t(t(Out))

 #This is the data set

 structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L,

 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,

 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,

 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = 
 c(#authentication,access control,

 #privacy,personal data, #security,malicious,security, data controller,

 id management,security, password,recovery), class = factor),

 class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,

 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,

 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,

 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L,

 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,

 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,

 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,

 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L,

 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms,

 class.1, class.2, class.3), class = data.frame, row.names = c(NA,

 -50L))

 

 also when I try to include the terms to the result it gives me an error:

 mode.names- data.frame (df[,1],Out)

 Error in data.frame(df[, 1], Out) :

 arguments imply differing number of rows: 50, 3


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

Re: [R] Toronto CRAN mirror 403 error?

2015-05-29 Thread David Winsemius


On May 29, 2015, at 7:12 PM, Mark Drummond wrote:

 I've been getting a 403 when I try pulling from the Toronto CRAN mirror
 today.
 
 http://cran.utstat.utoronto.ca/

Right. It's been out for the last 2.7 days:

http://cran.r-project.org/mirmon_report.html#ca

 
 Is there a contact list for mirror managers?

Why do you care? Why not use another mirror? The 
http://lib.stat.cmu.edu/R/CRAN/ mirror should be fairly close if you are on 
that side of the continent.

-- 
David.

 
 -- 
 Cheers, Mark
 
 *Mark Drummond*
 m...@markdrummond.ca
 
 When I get sad, I stop being sad and be Awesome instead. TRUE STORY.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Toronto CRAN mirror 403 error?

2015-05-29 Thread Mark Drummond

I've been getting a 403 when I try pulling from the Toronto CRAN mirror
today.

http://cran.utstat.utoronto.ca/

Is there a contact list for mirror managers?

-- 
Cheers, Mark

*Mark Drummond*
m...@markdrummond.ca

When I get sad, I stop being sad and be Awesome instead. TRUE STORY.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Toronto CRAN mirror 403 error?

2015-05-29 Thread Jeff Newmiller

This is why there are mirrors. You don't have to wait for them or tell them to 
do their jobs.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On May 29, 2015 7:12:56 PM PDT, Mark Drummond m...@markdrummond.ca wrote:
I've been getting a 403 when I try pulling from the Toronto CRAN mirror
today.

http://cran.utstat.utoronto.ca/

Is there a contact list for mirror managers?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Toronto CRAN mirror 403 error?

2015-05-29 Thread Gabor Grothendieck

On Fri, May 29, 2015 at 10:12 PM, Mark Drummond m...@markdrummond.ca wrote:
 I've been getting a 403 when I try pulling from the Toronto CRAN mirror
 today.

 http://cran.utstat.utoronto.ca/

 Is there a contact list for mirror managers?


See the cran_mirrors.csv file in
  R.home(doc)
of your R distribution.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on R Functionality Histogram

2015-05-29 Thread Shivi82

Thanks you Sarah. This was very impressive and really helped me out.




--
View this message in context: 
http://r.789695.n4.nabble.com/Help-on-R-Functionality-Histogram-tp4707886p4707949.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Converting unique strings to unique numbers

2015-05-29 Thread Kate Ignatius

I have a pedigree file as so:

X0001 BYX859  0  0  2  1 BYX859
X0001 BYX894  0  0  1  1 BYX894
X0001 BYX862 BYX894 BYX859  2  2 BYX862
X0001 BYX863 BYX894 BYX859  2  2 BYX863
X0001 BYX864 BYX894 BYX859  2  2 BYX864
X0001 BYX865 BYX894 BYX859  2  2 BYX865

And I was hoping to change all unique string values to numbers.

That is:

BYX859 = 1
BYX894 = 2
BYX862 = 3
BYX863 = 4
BYX864 = 5
BYX865 = 6

But only in columns 2 - 4.  Essentially I would like the data to look like this:

X0001 1 0 0  2  1 BYX859
X0001 2 0 0  1  1 BYX894
X0001 3 2 1  2  2 BYX862
X0001 4 2 1  2  2 BYX863
X0001 5 2 1  2  2 BYX864
X0001 6 2 1  2  2 BYX865

Is this possible with factors?

Thanks!

K.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting unique strings to unique numbers

2015-05-29 Thread MacQueen, Don

Here is an example to get you started:

mycol - c('b','a','d','d','b','c')
as.numeric(factor(mycol))

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 5/29/15, 9:58 AM, Kate Ignatius kate.ignat...@gmail.com wrote:

I have a pedigree file as so:

X0001 BYX859  0  0  2  1 BYX859
X0001 BYX894  0  0  1  1 BYX894
X0001 BYX862 BYX894 BYX859  2  2 BYX862
X0001 BYX863 BYX894 BYX859  2  2 BYX863
X0001 BYX864 BYX894 BYX859  2  2 BYX864
X0001 BYX865 BYX894 BYX859  2  2 BYX865

And I was hoping to change all unique string values to numbers.

That is:

BYX859 = 1
BYX894 = 2
BYX862 = 3
BYX863 = 4
BYX864 = 5
BYX865 = 6

But only in columns 2 - 4.  Essentially I would like the data to look
like this:

X0001 1 0 0  2  1 BYX859
X0001 2 0 0  1  1 BYX894
X0001 3 2 1  2  2 BYX862
X0001 4 2 1  2  2 BYX863
X0001 5 2 1  2  2 BYX864
X0001 6 2 1  2  2 BYX865

Is this possible with factors?

Thanks!

K.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting unique strings to unique numbers

2015-05-29 Thread Sarah Goslee

On Fri, May 29, 2015 at 2:16 PM, Hervé Pagès hpa...@fredhutch.org wrote:
 Hi Kate,

 I found that matching the character vector to itself is a very
 effective way to do this:

   x - c(a, bunch, of, strings, whose, exact, content,
  is, of, little, interest)
   ids - match(x, x)
   ids
   # [1]  1  2  3  4  5  6  7  8  3 10 11

 By using this trick, many manipulations on character vectors can
 be replaced by manipulations on integer vectors, which are sometimes
 way more efficient.

Hm. I hadn't thought of that approach - I use the
as.numeric(factor(...)) approach.

So I was curious, and compared the two:


set.seed(43)
x - sample(letters, 1, replace=TRUE)

system.time({
  for(i in seq_len(2)) {
  ids1 - match(x, x)
}})

#   user  system elapsed
#  9.657   0.000   9.657

system.time({
  for(i in seq_len(2)) {
  ids2 - as.numeric(factor(x, levels=letters))
}})

#   user  system elapsed
#   6.160.006.16

Using factor() is faster. More importantly, using factor() lets you
set the order of the indices in an expected fashion, where match()
assigns them in the order of occurrence.

head(data.frame(x, ids1, ids2))

  x ids1 ids2
1 m1   13
2 x2   24
3 b32
4 s4   19
5 i59
6 o6   15

In a problem like Kate's where there are several columns for which the
same ordering of indices is desired, that becomes really important.

If you take Bill Dunlap's modification of the match() approach, it
resolves both problems: matching against the pooled unique values is
both faster than the factor() version and gives the same result:


On Fri, May 29, 2015 at 1:31 PM, William Dunlap wdun...@tibco.com wrote:
 match() will do what you want.  E.g., run your data through
 the following function.

f - function (data)
{
uniqStrings - unique(c(data[, 2], data[, 3], data[, 4]))
uniqStrings - setdiff(uniqStrings, 0)
for (j in 2:4) {
data[[j]] - match(data[[j]], uniqStrings, nomatch = 0L)
}
data
}

##

y - data.frame(id = 1:5000, v1 = sample(letters, 5000, replace=TRUE),
v2 = sample(letters, 5000, replace=TRUE), v3 = sample(letters, 5000,
replace=TRUE), stringsAsFactors=FALSE)


system.time({
  for(i in seq_len(2)) {
ids3 - f(data.frame(y))
}})

#   user  system elapsed
# 22.515   0.000  22.518



ff - function(data)
{
uniqStrings - unique(c(data[, 2], data[, 3], data[, 4]))
uniqStrings - setdiff(uniqStrings, 0)
for (j in 2:4) {
data[[j]] - as.numeric(factor(data[[j]], levels=uniqStrings))
}
data
}

system.time({
  for(i in seq_len(2)) {
ids4 - ff(data.frame(y))
}})

#user  system elapsed
#  26.083   0.002  26.090

head(ids3)

  id v1 v2 v3
1  1  1  2  8
2  2  2 19 22
3  3  3 21 16
4  4  4 10 17
5  5  1  8 18
6  6  1 12 26

head(ids4)

  id v1 v2 v3
1  1  1  2  8
2  2  2 19 22
3  3  3 21 16
4  4  4 10 17
5  5  1  8 18
6  6  1 12 26

Kate, if you're getting all zeros, check str(yourdataframe) - it's
likely that when you imported your data into R the strings were
already converted to factors, which is not what you want (ask me how I
know this!).

Sarah



 On 05/29/2015 09:58 AM, Kate Ignatius wrote:

 I have a pedigree file as so:

 X0001 BYX859  0  0  2  1 BYX859
 X0001 BYX894  0  0  1  1 BYX894
 X0001 BYX862 BYX894 BYX859  2  2 BYX862
 X0001 BYX863 BYX894 BYX859  2  2 BYX863
 X0001 BYX864 BYX894 BYX859  2  2 BYX864
 X0001 BYX865 BYX894 BYX859  2  2 BYX865

 And I was hoping to change all unique string values to numbers.

 That is:

 BYX859 = 1
 BYX894 = 2
 BYX862 = 3
 BYX863 = 4
 BYX864 = 5
 BYX865 = 6

 But only in columns 2 - 4.  Essentially I would like the data to look like
 this:

 X0001 1 0 0  2  1 BYX859
 X0001 2 0 0  1  1 BYX894
 X0001 3 2 1  2  2 BYX862
 X0001 4 2 1  2  2 BYX863
 X0001 5 2 1  2  2 BYX864
 X0001 6 2 1  2  2 BYX865

 Is this possible with factors?

 Thanks!

 K.



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting unique strings to unique numbers

2015-05-29 Thread William Dunlap

I'm not sure why which particular ID gets assigned to each string would
matter but maybe I'm missing something. What really matters is that each
string receives a unique ID. match(x, x) does that.

I think each row of the OP's dataset represented an individual (column 2)
followed by its mother and father (columns 3 and 4).  I assume that the
marker 0 means that a parent is not in the dataset.  If you match against
the strings in column 2 only, in their original order, then the resulting
numbers
give the row number of an individual, making it straightforward to look up
information regarding the ancestors of an individual.  Hence the choice of
numeric ID's may be important.



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, May 29, 2015 at 1:29 PM, Hervé Pagès hpa...@fredhutch.org wrote:

 Hi Sarah,

 On 05/29/2015 12:04 PM, Sarah Goslee wrote:

 On Fri, May 29, 2015 at 2:16 PM, Hervé Pagès hpa...@fredhutch.org
 wrote:

 Hi Kate,

 I found that matching the character vector to itself is a very
 effective way to do this:

x - c(a, bunch, of, strings, whose, exact, content,
   is, of, little, interest)
ids - match(x, x)
ids
# [1]  1  2  3  4  5  6  7  8  3 10 11

 By using this trick, many manipulations on character vectors can
 be replaced by manipulations on integer vectors, which are sometimes
 way more efficient.


 Hm. I hadn't thought of that approach - I use the
 as.numeric(factor(...)) approach.

 So I was curious, and compared the two:


 set.seed(43)
 x - sample(letters, 1, replace=TRUE)

 system.time({
for(i in seq_len(2)) {
ids1 - match(x, x)
 }})

 #   user  system elapsed
 #  9.657   0.000   9.657

 system.time({
for(i in seq_len(2)) {
ids2 - as.numeric(factor(x, levels=letters))
 }})

 #   user  system elapsed
 #   6.160.006.16

 Using factor() is faster.


 That's an unfair comparison, because you already know what the levels
 are so you can supply them to your call to factor(). Most of the time
 you don't know what the levels are so either you just do factor(x) and
 let the factor() constructor compute the levels for you, or you compute
 them yourself upfront with something like factor(x, levels=unique(x)).

   library(microbenchmark)

   microbenchmark(
 {ids1 - match(x, x)},
 {ids2 - as.integer(factor(x, levels=letters))},
 {ids3 - as.integer(factor(x))},
 {ids4 - as.integer(factor(x, levels=unique(x)))}
   )
   Unit: microseconds
   expr min   lq
{ ids1 - match(x, x) } 245.979 262.2390
{ ids2 - as.integer(factor(x, levels = letters)) } 214.115 219.2320
  { ids3 - as.integer(factor(x)) } 380.782 388.7295
  { ids4 - as.integer(factor(x, levels = unique(x))) } 332.250 342.6630
mean   median  uq max neval
267.3210 264.4845 268.348 293.894   100
226.9913 220.9870 226.147 314.875   100
402.2242 394.7165 412.075 481.410   100
349.7405 345.3090 353.162 383.002   100

  More importantly, using factor() lets you
 set the order of the indices in an expected fashion, where match()
 assigns them in the order of occurrence.

 head(data.frame(x, ids1, ids2))

x ids1 ids2
 1 m1   13
 2 x2   24
 3 b32
 4 s4   19
 5 i59
 6 o6   15

 In a problem like Kate's where there are several columns for which the
 same ordering of indices is desired, that becomes really important.


 I'm not sure why which particular ID gets assigned to each string would
 matter but maybe I'm missing something. What really matters is that each
 string receives a unique ID. match(x, x) does that.

 In Kate's problem, where the strings are in more than one column,
 and you want the ID to be unique across the columns, you need to do
 match(x, x) where 'x' contains the strings from all the columns
 that you want to replace:

   m - matrix(c(
 X0001, BYX859,0,0,  2,  1, BYX859,
 X0001, BYX894,0,0,  1,  1, BYX894,
 X0001, BYX862, BYX894, BYX859,  2,  2, BYX862,
 X0001, BYX863, BYX894, BYX859,  2,  2, BYX863,
 X0001, BYX864, BYX894, BYX859,  2,  2, BYX864,
 X0001, BYX865, BYX894, BYX859,  2,  2, BYX865
   ), ncol=7, byrow=TRUE)

   x - m[ , 2:4]
   id - match(x, x, nomatch=0, incomparables=0)
   m[ , 2:4] - id

 No factor needed. No loop needed. ;-)

 Cheers,
 H.


 If you take Bill Dunlap's modification of the match() approach, it
 resolves both problems: matching against the pooled unique values is
 both faster than the factor() version and gives the same result:


 On Fri, May 29, 2015 at 1:31 PM, William Dunlap wdun...@tibco.com
 wrote:

 match() will do what you want.  E.g., run your data through
 the following function.

  f - function (data)
 {
  uniqStrings - unique(c(data[, 2], data[, 3], data[, 4]))
  uniqStrings - setdiff(uniqStrings, 0)
  for (j in 2:4) {
  data[[j]] -

Re: [R] about transforming a data.frame

2015-05-29 Thread Bogdan Tanasa

Hi Sarah,

thank you for your help. I have simplified the example, by reading the
elements in a data frame, eg :

df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12),
col_names = c
(B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
CT = c(5,2,2,2,2,2,4,4) )

I have used the the count() in the plyr package :

count_row_names - count(df$row_names)
count_col_names - count(df$col_names)

however, I would need to correlate these UNIQUE ELEMENTS in the columns
row_names or col_names with the numbers they associate in the  CT
columns, eg :

B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12
associate with 2 (in the CT column).

thank you very much,

bogdan




On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com
wrote:

 Hi,

 Please use dput() to provide your data, as it can get somewhat mangled
 by copy and pasting, especially if you post in HTML (as you are asked
 not to do in the posting guide).

 What is a unique element? is B4:B5:B6 an element, or are B4 and
 B5 each elements? That is, what is the result you expect to obtain
 for the sample data you provided?

 What code have you tried? I would think table() might be involved, and
 possibly strsplit(), but will refrain from putting more time into this
 until you provide a reproducible dataset with dput() and some clearer
 idea of your intent.

 Sarah

 On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote:
  Dear all,
 
  I would appreciate a suggestion on the following : I am working with a
  data.frame (below) :
 
EXPCT   row_names   col_names
  1   test -5B4:B5:B6B1:B2:B3
  2   test -2B7:B8:B9B1:B2:B3
  3   test -2D4:D5:D6H4:H5:H6
  4   test -2D10:D11:D12 F10:F11:F12
  5   test -2D10:D11:D12H1:H2:H3
  6   test -2E10:E11:E12G7:G8:G9
  7   test -4 A1:A2:A3D1:D2:D3
  8   test -4   B10:B11:B12B1:B2:B3
 
  what would be the easiest way to consider UNIQUE elements in the
 ROW_NAMES
  or the UNIQUE elements in the COL_NAMES and :
 
  print how many times these UNIQUE ELEMENTS associate with the numbers -5,
  -2, or -4 (these numbers are on the column names CT) ..
 
  thanks,
 
  bogdan
 
 --
 Sarah Goslee
 http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] about transforming a data.frame

2015-05-29 Thread Bogdan Tanasa

Dear all,

I would appreciate a suggestion on the following : I am working with a
data.frame (below) :

  EXPCT   row_names   col_names
1   test -5B4:B5:B6B1:B2:B3
2   test -2B7:B8:B9B1:B2:B3
3   test -2D4:D5:D6H4:H5:H6
4   test -2D10:D11:D12 F10:F11:F12
5   test -2D10:D11:D12H1:H2:H3
6   test -2E10:E11:E12G7:G8:G9
7   test -4 A1:A2:A3D1:D2:D3
8   test -4   B10:B11:B12B1:B2:B3

what would be the easiest way to consider UNIQUE elements in the ROW_NAMES
or the UNIQUE elements in the COL_NAMES and :

print how many times these UNIQUE ELEMENTS associate with the numbers -5,
-2, or -4 (these numbers are on the column names CT) ..

thanks,

bogdan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting unique strings to unique numbers

2015-05-29 Thread Hervé Pagès


Hi Sarah,

On 05/29/2015 12:04 PM, Sarah Goslee wrote:

On Fri, May 29, 2015 at 2:16 PM, Hervé Pagès hpa...@fredhutch.org wrote:

Hi Kate,

I found that matching the character vector to itself is a very
effective way to do this:

   x - c(a, bunch, of, strings, whose, exact, content,
  is, of, little, interest)
   ids - match(x, x)
   ids
   # [1]  1  2  3  4  5  6  7  8  3 10 11

By using this trick, many manipulations on character vectors can
be replaced by manipulations on integer vectors, which are sometimes
way more efficient.


Hm. I hadn't thought of that approach - I use the
as.numeric(factor(...)) approach.

So I was curious, and compared the two:


set.seed(43)
x - sample(letters, 1, replace=TRUE)

system.time({
   for(i in seq_len(2)) {
   ids1 - match(x, x)
}})

#   user  system elapsed
#  9.657   0.000   9.657

system.time({
   for(i in seq_len(2)) {
   ids2 - as.numeric(factor(x, levels=letters))
}})

#   user  system elapsed
#   6.160.006.16

Using factor() is faster.


That's an unfair comparison, because you already know what the levels
are so you can supply them to your call to factor(). Most of the time
you don't know what the levels are so either you just do factor(x) and
let the factor() constructor compute the levels for you, or you compute
them yourself upfront with something like factor(x, levels=unique(x)).

  library(microbenchmark)

  microbenchmark(
{ids1 - match(x, x)},
{ids2 - as.integer(factor(x, levels=letters))},
{ids3 - as.integer(factor(x))},
{ids4 - as.integer(factor(x, levels=unique(x)))}
  )
  Unit: microseconds
  expr min   lq
   { ids1 - match(x, x) } 245.979 262.2390
   { ids2 - as.integer(factor(x, levels = letters)) } 214.115 219.2320
 { ids3 - as.integer(factor(x)) } 380.782 388.7295
 { ids4 - as.integer(factor(x, levels = unique(x))) } 332.250 342.6630
   mean   median  uq max neval
   267.3210 264.4845 268.348 293.894   100
   226.9913 220.9870 226.147 314.875   100
   402.2242 394.7165 412.075 481.410   100
   349.7405 345.3090 353.162 383.002   100


More importantly, using factor() lets you
set the order of the indices in an expected fashion, where match()
assigns them in the order of occurrence.

head(data.frame(x, ids1, ids2))

   x ids1 ids2
1 m1   13
2 x2   24
3 b32
4 s4   19
5 i59
6 o6   15

In a problem like Kate's where there are several columns for which the
same ordering of indices is desired, that becomes really important.


I'm not sure why which particular ID gets assigned to each string would
matter but maybe I'm missing something. What really matters is that each
string receives a unique ID. match(x, x) does that.

In Kate's problem, where the strings are in more than one column,
and you want the ID to be unique across the columns, you need to do
match(x, x) where 'x' contains the strings from all the columns
that you want to replace:

  m - matrix(c(
X0001, BYX859,0,0,  2,  1, BYX859,
X0001, BYX894,0,0,  1,  1, BYX894,
X0001, BYX862, BYX894, BYX859,  2,  2, BYX862,
X0001, BYX863, BYX894, BYX859,  2,  2, BYX863,
X0001, BYX864, BYX894, BYX859,  2,  2, BYX864,
X0001, BYX865, BYX894, BYX859,  2,  2, BYX865
  ), ncol=7, byrow=TRUE)

  x - m[ , 2:4]
  id - match(x, x, nomatch=0, incomparables=0)
  m[ , 2:4] - id

No factor needed. No loop needed. ;-)

Cheers,
H.



If you take Bill Dunlap's modification of the match() approach, it
resolves both problems: matching against the pooled unique values is
both faster than the factor() version and gives the same result:


On Fri, May 29, 2015 at 1:31 PM, William Dunlap wdun...@tibco.com wrote:

match() will do what you want.  E.g., run your data through
the following function.


f - function (data)
{
 uniqStrings - unique(c(data[, 2], data[, 3], data[, 4]))
 uniqStrings - setdiff(uniqStrings, 0)
 for (j in 2:4) {
 data[[j]] - match(data[[j]], uniqStrings, nomatch = 0L)
 }
 data
}

##

y - data.frame(id = 1:5000, v1 = sample(letters, 5000, replace=TRUE),
v2 = sample(letters, 5000, replace=TRUE), v3 = sample(letters, 5000,
replace=TRUE), stringsAsFactors=FALSE)


system.time({
   for(i in seq_len(2)) {
 ids3 - f(data.frame(y))
}})

#   user  system elapsed
# 22.515   0.000  22.518



ff - function(data)
{
 uniqStrings - unique(c(data[, 2], data[, 3], data[, 4]))
 uniqStrings - setdiff(uniqStrings, 0)
 for (j in 2:4) {
 data[[j]] - as.numeric(factor(data[[j]], levels=uniqStrings))
 }
 data
}

system.time({
   for(i in seq_len(2)) {
 ids4 - ff(data.frame(y))
}})

#user  system elapsed
#  26.083   0.002  26.090

head(ids3)

   id v1 v2 v3
1  1  1  2  8
2  2  2 19 22
3  3  3 21 16
4  4  4 10 17
5  5  1  8 18
6  6  1 12 26

head(ids4)

   id v1 v2 v3
1  1  1  2  8
2  2  2 19

Re: [R] about transforming a data.frame

2015-05-29 Thread Sarah Goslee

Hi,

Please use dput() to provide your data, as it can get somewhat mangled
by copy and pasting, especially if you post in HTML (as you are asked
not to do in the posting guide).

What is a unique element? is B4:B5:B6 an element, or are B4 and
B5 each elements? That is, what is the result you expect to obtain
for the sample data you provided?

What code have you tried? I would think table() might be involved, and
possibly strsplit(), but will refrain from putting more time into this
until you provide a reproducible dataset with dput() and some clearer
idea of your intent.

Sarah

On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote:
 Dear all,

 I would appreciate a suggestion on the following : I am working with a
 data.frame (below) :

   EXPCT   row_names   col_names
 1   test -5B4:B5:B6B1:B2:B3
 2   test -2B7:B8:B9B1:B2:B3
 3   test -2D4:D5:D6H4:H5:H6
 4   test -2D10:D11:D12 F10:F11:F12
 5   test -2D10:D11:D12H1:H2:H3
 6   test -2E10:E11:E12G7:G8:G9
 7   test -4 A1:A2:A3D1:D2:D3
 8   test -4   B10:B11:B12B1:B2:B3

 what would be the easiest way to consider UNIQUE elements in the ROW_NAMES
 or the UNIQUE elements in the COL_NAMES and :

 print how many times these UNIQUE ELEMENTS associate with the numbers -5,
 -2, or -4 (these numbers are on the column names CT) ..

 thanks,

 bogdan

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Automatically updating a plot from a regularly updated data file

2015-05-29 Thread Sam Albers

Hi all,

I have a question about using R in a way that may not be correct but I
thought I would ask anyway.

I have an instrument that outputs a text file with comma separated data. A
new line is added to the file each time the instrument takes a new reading.
Is there any way to configure R such that a script to generate a plot from
said text file is re-run each time the file is modified (i.e. a new line is
added). So basically update an exported plot each time a new line of data
is collected.

Is this type of thing possible in R? If not can anyone recommend some
Windows (or Linux if need be) tools that could help me accomplish this
preferably still utilizing R's plotting capabilites? I know that there are
other tools that can do this all but nothing makes figures as nicely as R.

I suppose more generally this is a question about way to automate processes
with R to take advantage of R's functionality.

Thanks in advance.

Sam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting unique strings to unique numbers

2015-05-29 Thread William Dunlap

match() will do what you want.  E.g., run your data through
the following function.

f - function (data)
{
uniqStrings - unique(c(data[, 2], data[, 3], data[, 4]))
uniqStrings - setdiff(uniqStrings, 0)
for (j in 2:4) {
data[[j]] - match(data[[j]], uniqStrings, nomatch = 0L)
}
data
}



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, May 29, 2015 at 9:58 AM, Kate Ignatius kate.ignat...@gmail.com
wrote:

 I have a pedigree file as so:

 X0001 BYX859  0  0  2  1 BYX859
 X0001 BYX894  0  0  1  1 BYX894
 X0001 BYX862 BYX894 BYX859  2  2 BYX862
 X0001 BYX863 BYX894 BYX859  2  2 BYX863
 X0001 BYX864 BYX894 BYX859  2  2 BYX864
 X0001 BYX865 BYX894 BYX859  2  2 BYX865

 And I was hoping to change all unique string values to numbers.

 That is:

 BYX859 = 1
 BYX894 = 2
 BYX862 = 3
 BYX863 = 4
 BYX864 = 5
 BYX865 = 6

 But only in columns 2 - 4.  Essentially I would like the data to look like
 this:

 X0001 1 0 0  2  1 BYX859
 X0001 2 0 0  1  1 BYX894
 X0001 3 2 1  2  2 BYX862
 X0001 4 2 1  2  2 BYX863
 X0001 5 2 1  2  2 BYX864
 X0001 6 2 1  2  2 BYX865

 Is this possible with factors?

 Thanks!

 K.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with nls

2015-05-29 Thread Bert Gunter

AFAICS this has essentially nothing to do with R. Please post elsewhere,
e.g. on a statistics list like stats.stackexchange.com.

Cheers,
Bert



On Fri, May 29, 2015 at 6:44 AM, Abolfazl Saghafi 
abolfazl.sagh...@gmail.com wrote:

 Can some help me with a question on this bass model, please

 As I read some articles on this topic, I understand that
 1. the bass formula is
 N(t) = pm + (q-p) N(t-1) - (q/m) (N(t-1))^2
 2. which is a difference equation with the solution
 N(t) = m (1 − exp(−(p+q)t)) / (1 + (q/p)exp(−(p+q)t))
 3. So, using a linear regression would give us some some initial
 estimations for the parameters m, p, q
 4. we then can put the initial estimations into a NLS to get the better
 estimations

 Am I right?

 Now the question is,
 why is that I see people use cumulative data and try to fit it into a pdf
 as
 M * ( ((P+Q)^2 / P) * exp(-(P+Q) * T79) ) / (1+(Q/P)*exp(-(P+Q)*T79))^2,

 why not using the cumulative data and fit directly the N(t)

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why I am not able to load library(R.matlab)? Other packages are fine.

2015-05-29 Thread C W

Wow, thanks Ben.  That worked very well.

I guess I didn't have R.methodS3?  But that doesn't make sense, because I
was using R.matlab few weeks ago.  I believe I was on R 3.1.

Maybe it's in R 3.1 folder?  I am using a Mac, btw.

Cheers,

-M

On Fri, May 29, 2015 at 1:55 PM, Ben Bolker bbol...@gmail.com wrote:

 C W tmrsg11 at gmail.com writes:

 
  Hi Henrik,
 
  I don't quite get what I should do here.  I am not familiar with
  R.methodS3.  Can you tell me what command exactly do I need to do?
 
  Thanks,
 
  Mike

 install.packages(R.methodsS3)
 install.packages(R.matlab)
 library(R.matlab)



   [snip snip snip]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about transforming a data.frame

2015-05-29 Thread John Kane

Bogdan, the request was for data in dput() format. 

Type ?dput for more information.

Do dput(myfile) copy the ouput and paste into the email

You should get something like: 
structure(list(c1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 
8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c((0.509,0.614], 
(0.614,0.718], (0.718,0.822], (0.822,0.926], (0.926,1.03], 
(1.03,1.13], (1.13,1.24], (1.24,1.34], (1.34,1.45], (1.45,1.55]
), class = factor), s1 = c(0.51, 0.52, 0.58, 0.58, 0.59, 0.6, 
0.63, 0.65, 0.68, 0.74, 0.74, 0.75, 0.77, 0.77, 0.77, 0.78, 0.79, 
0.84, 0.84, 0.85, 0.87, 0.93, 0.93, 0.95, 0.99, 1.04, 1.09, 1.11, 
1.13, 1.14, 1.14, 1.14, 1.17, 1.18, 1.19, 1.22, 1.22, 1.23, 1.28, 
1.29, 1.3, 1.32, 1.37, 1.38, 1.38, 1.4, 1.43, 1.47, 1.52, 1.55
)), .Names = c(c1, s1), row.names = c(NA, -50L), class = data.frame)

Data in duput() format is the preferred way to get data in R-help since it 
provides a perfect copy of what you have on your machine.  Any other way of 
providing data risks the recipients reading it into R differently than it is on 
your machine.

John Kane
Kingston ON Canada


 -Original Message-
 From: tan...@gmail.com
 Sent: Fri, 29 May 2015 13:58:20 -0700
 To: sarah.gos...@gmail.com
 Subject: Re: [R] about transforming a data.frame
 
 Hi Sarah,
 
 thank you for your help. I have simplified the example, by reading the
 elements in a data frame, eg :
 
 df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
 D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12),
 col_names = c
 (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
 CT = c(5,2,2,2,2,2,4,4) )
 
 I have used the the count() in the plyr package :
 
 count_row_names - count(df$row_names)
 count_col_names - count(df$col_names)
 
 however, I would need to correlate these UNIQUE ELEMENTS in the columns
 row_names or col_names with the numbers they associate in the  CT
 columns, eg :
 
 B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12
 associate with 2 (in the CT column).
 
 thank you very much,
 
 bogdan
 
 
 
 
 On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:
 
 Hi,
 
 Please use dput() to provide your data, as it can get somewhat mangled
 by copy and pasting, especially if you post in HTML (as you are asked
 not to do in the posting guide).
 
 What is a unique element? is B4:B5:B6 an element, or are B4 and
 B5 each elements? That is, what is the result you expect to obtain
 for the sample data you provided?
 
 What code have you tried? I would think table() might be involved, and
 possibly strsplit(), but will refrain from putting more time into this
 until you provide a reproducible dataset with dput() and some clearer
 idea of your intent.
 
 Sarah
 
 On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com wrote:
 Dear all,
 
 I would appreciate a suggestion on the following : I am working with a
 data.frame (below) :
 
   EXPCT   row_names   col_names
 1   test -5B4:B5:B6B1:B2:B3
 2   test -2B7:B8:B9B1:B2:B3
 3   test -2D4:D5:D6H4:H5:H6
 4   test -2D10:D11:D12 F10:F11:F12
 5   test -2D10:D11:D12H1:H2:H3
 6   test -2E10:E11:E12G7:G8:G9
 7   test -4 A1:A2:A3D1:D2:D3
 8   test -4   B10:B11:B12B1:B2:B3
 
 what would be the easiest way to consider UNIQUE elements in the
 ROW_NAMES
 or the UNIQUE elements in the COL_NAMES and :
 
 print how many times these UNIQUE ELEMENTS associate with the numbers
 -5,
 -2, or -4 (these numbers are on the column names CT) ..
 
 thanks,
 
 bogdan
 
 --
 Sarah Goslee
 http://www.functionaldiversity.org
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why I am not able to load library(R.matlab)? Other packages are fine.

2015-05-29 Thread Ben Bolker

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 I think Henrik's point (which I merely clarified) was that something
funky (we'll probably never know what, and it's not worth figuring out
unless it happens again/to other people) had gone wrong and that the
easiest thing to do was just to reinstall.

References:
* https://www.youtube.com/watch?v=t2F1rFmyQmY
*
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.208.9970rep=rep1type=pdf


On 15-05-29 05:11 PM, C W wrote:
 Wow, thanks Ben.  That worked very well.
 
 I guess I didn't have R.methodS3?  But that doesn't make sense,
 because I was using R.matlab few weeks ago.  I believe I was on R
 3.1.
 
 Maybe it's in R 3.1 folder?  I am using a Mac, btw.
 
 Cheers,
 
 -M
 
 On Fri, May 29, 2015 at 1:55 PM, Ben Bolker bbol...@gmail.com
 wrote:
 
 C W tmrsg11 at gmail.com writes:
 
 
 Hi Henrik,
 
 I don't quite get what I should do here.  I am not familiar
 with R.methodS3.  Can you tell me what command exactly do I
 need to do?
 
 Thanks,
 
 Mike
 
 install.packages(R.methodsS3) install.packages(R.matlab) 
 library(R.matlab)
 
 
 
 [snip snip snip]
 
 __ 
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
 see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
 the posting guide http://www.R-project.org/posting-guide.html and
 provide commented, minimal, self-contained, reproducible code.
 
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJVaNXMAAoJEOCV5YRblxUHj6kH/3W3etyn+HlT0X1PEj7DQf2c
Qo0q9ed2csPRLbLLrpX2FPKbxLg/g6MSxmIQ118tbWhkzKfRoyxCZHLcT+U2xLuR
V7QAS3Yns2ENSSSH1GvdSeFZTQWW3XFZN/kT+/zQYjaZewZOlo4Cgqc16c6mGBRS
eSIRIyA3iJWnMEc878nbMJztvsEqnpZSNSIXiI91UX/l8sDrBNYCNtfzY86JqJhp
8O0q7zkaRIrb6UuViY3qTC5+qpGruUYIUbeqyNei7MNErrG3AufsODfs5d/CjSCa
5jlbS512JRrQFV2JKHU+AH+4Q9CJQBVS+F6JZdjhHB2fUmAx0XIR6IJEBfSvBSk=
=nO+b
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Automatically updating a plot from a regularly updated data file

2015-05-29 Thread MacQueen, Don

A lot will depend on how frequently data is added to the file, how big the
file gets, and how important it is to see updated plots quickly.

I have R doing exactly what you describe, and have found logic like this
(which might be described as crude) to be sufficient

while( {some condition} ) {
  {read the data file}
  {make the plot}
  Sys.sleep( {some number of seconds} )
}

Of course this is not actually noticing that the file has changed and
responding, it is just updating at regular intervals. But that might be
good enough.


A slightly more sophisticated approach would be to set up a loop like the
above, and have the sleep time short, but within the loop use

  file.info({the csv file})

and when the modification time is later than the previous modification
time, read the data and update the plot.


If the file gets really big, you might not want to reload the entire file
each time. That might lead you into things like keeping track of how many
lines the file has, and only reading the new lines -- if you need your
plots to be cumulative. In that situation you might end up using the
pipe() function to create your connection to the file, and pass the OS's
'tail' command (Linux or Mac, not sure about Win) to pipe.

If you only need to plot the last, say, X hours of data, then you may not
need to keep track of the number of lines, just read the last N lines
(hopefully not too hard to figure out what N should be).

If you don't want an R process running indefinitely, as is the case for
the above, you can, on Linux and Mac, set up a cron job to run an R script
as often as once per minute. I have at least one such task where it
happens every 2 minutes, and makes plots of the current data. In this
case, we have 16 measurement devices each sending data to a MySQL database
once per minute; the R script pulls the data from the database every 2
minutes and plots, and the system works well for our needs. Windows will
have some equivalent to cron, I just don't know what it is.

FWIW, all of the above write png files which are viewed via a webserver.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 5/29/15, 12:51 PM, Sam Albers tonightstheni...@gmail.com wrote:

Hi all,

I have a question about using R in a way that may not be correct but I
thought I would ask anyway.

I have an instrument that outputs a text file with comma separated data. A
new line is added to the file each time the instrument takes a new
reading.
Is there any way to configure R such that a script to generate a plot from
said text file is re-run each time the file is modified (i.e. a new line
is
added). So basically update an exported plot each time a new line of data
is collected.

Is this type of thing possible in R? If not can anyone recommend some
Windows (or Linux if need be) tools that could help me accomplish this
preferably still utilizing R's plotting capabilites? I know that there are
other tools that can do this all but nothing makes figures as nicely as R.

I suppose more generally this is a question about way to automate
processes
with R to take advantage of R's functionality.

Thanks in advance.

Sam

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about transforming a data.frame

2015-05-29 Thread Bogdan Tanasa

Hi John,

thanks for clarifications, yes, of course, the dput() output is the
following :

dput(dataframe_matches_ddCT)

structure(list(FIGURE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label
= test, class = factor), ddCT = c(-5.4595, -2.7467,
-2.7467, -2.7467, -2.7467, -2.7467, -4.5927, -4.5927), row_names =
structure(c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L), .Label = c(B4:B5:B6,
B7:B8:B9, D4:D5:D6, D10:D11:D12, E10:E11:E12, A1:A2:A3,
B10:B11:B12
), class = factor), col_names = structure(c(1L, 1L, 2L, 3L, 4L, 5L, 6L,
1L), .Label = c(B1:B2:B3, H4:H5:H6, F10:F11:F12,
H1:H2:H3, G7:G8:G9, D1:D2:D3), class = factor),
CTaverage_MATRIX_SUBSTRACTIONS = c(-5.4595413208,
-2.7467829387, -2.74099286393334, -2.7433134714, -2.7480595907,
-2.755259196, -4.59402211506667, -4.5927206675)), .Names = c(FIGURE,
ddCT, row_names, col_names, CTaverage_MATRIX_SUBSTRACTIONS
), row.names = c(NA, 8L), class = data.frame)

thanks again for your input,

-- bogdan

On Fri, May 29, 2015 at 2:11 PM, John Kane jrkrid...@inbox.com wrote:

 Bogdan, the request was for data in dput() format.

 Type ?dput for more information.

 Do dput(myfile) copy the ouput and paste into the email

 You should get something like:
 structure(list(c1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L,
 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c((0.509,0.614],
 (0.614,0.718], (0.718,0.822], (0.822,0.926], (0.926,1.03],
 (1.03,1.13], (1.13,1.24], (1.24,1.34], (1.34,1.45], (1.45,1.55]
 ), class = factor), s1 = c(0.51, 0.52, 0.58, 0.58, 0.59, 0.6,
 0.63, 0.65, 0.68, 0.74, 0.74, 0.75, 0.77, 0.77, 0.77, 0.78, 0.79,
 0.84, 0.84, 0.85, 0.87, 0.93, 0.93, 0.95, 0.99, 1.04, 1.09, 1.11,
 1.13, 1.14, 1.14, 1.14, 1.17, 1.18, 1.19, 1.22, 1.22, 1.23, 1.28,
 1.29, 1.3, 1.32, 1.37, 1.38, 1.38, 1.4, 1.43, 1.47, 1.52, 1.55
 )), .Names = c(c1, s1), row.names = c(NA, -50L), class = data.frame)

 Data in duput() format is the preferred way to get data in R-help since it
 provides a perfect copy of what you have on your machine.  Any other way of
 providing data risks the recipients reading it into R differently than it
 is on your machine.

 John Kane
 Kingston ON Canada


  -Original Message-
  From: tan...@gmail.com
  Sent: Fri, 29 May 2015 13:58:20 -0700
  To: sarah.gos...@gmail.com
  Subject: Re: [R] about transforming a data.frame
 
  Hi Sarah,
 
  thank you for your help. I have simplified the example, by reading the
  elements in a data frame, eg :
 
  df - data.frame (row_names = c(B4:B5:B6, B7:B8:B9, D4:D5:D6,
  D10:D11:D12, D10:D11:D12, E10:E11:E12, A1:A2:A3, B10:B11:B12),
  col_names = c
 
 (B1:B2:B3,B1:B2:B3,H4:H5:H6,F10:F11:F12,H1:H2:H3,G7:G8:G9,D1:D2:D3,B1:B2:B3),
  CT = c(5,2,2,2,2,2,4,4) )
 
  I have used the the count() in the plyr package :
 
  count_row_names - count(df$row_names)
  count_col_names - count(df$col_names)
 
  however, I would need to correlate these UNIQUE ELEMENTS in the columns
  row_names or col_names with the numbers they associate in the  CT
  columns, eg :
 
  B1:B2:B3 associate with 5, 2, 4 (in CT column), or D10:D11:D12
  associate with 2 (in the CT column).
 
  thank you very much,
 
  bogdan
 
 
 
 
  On Fri, May 29, 2015 at 1:32 PM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  Hi,
 
  Please use dput() to provide your data, as it can get somewhat mangled
  by copy and pasting, especially if you post in HTML (as you are asked
  not to do in the posting guide).
 
  What is a unique element? is B4:B5:B6 an element, or are B4 and
  B5 each elements? That is, what is the result you expect to obtain
  for the sample data you provided?
 
  What code have you tried? I would think table() might be involved, and
  possibly strsplit(), but will refrain from putting more time into this
  until you provide a reproducible dataset with dput() and some clearer
  idea of your intent.
 
  Sarah
 
  On Fri, May 29, 2015 at 4:19 PM, Bogdan Tanasa tan...@gmail.com
 wrote:
  Dear all,
 
  I would appreciate a suggestion on the following : I am working with a
  data.frame (below) :
 
EXPCT   row_names   col_names
  1   test -5B4:B5:B6B1:B2:B3
  2   test -2B7:B8:B9B1:B2:B3
  3   test -2D4:D5:D6H4:H5:H6
  4   test -2D10:D11:D12 F10:F11:F12
  5   test -2D10:D11:D12H1:H2:H3
  6   test -2E10:E11:E12G7:G8:G9
  7   test -4 A1:A2:A3D1:D2:D3
  8   test -4   B10:B11:B12B1:B2:B3
 
  what would be the easiest way to consider UNIQUE elements in the
  ROW_NAMES
  or the UNIQUE elements in the COL_NAMES and :
 
  print how many times these UNIQUE ELEMENTS associate with the numbers
  -5,
  -2, or -4 (these numbers are on the column names CT) ..
 
  thanks,
 
  bogdan
 
  --
  Sarah Goslee
  http://www.functionaldiversity.org
 
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing

67 matches

Mail list logo