Re: [R] R help

2016-01-30 Thread Boris Steipe
I think the error message is pretty clear. Your calculations are attempting to 
allocate more memory than you have available. As to what is causing your code 
to do this, only someone familiar with your code could possibly tell.

B.
(Read the posting guide, please - and don't post in HTML :-)


On Jan 30, 2016, at 1:44 AM, Anukriti Gupta  wrote:

> Hi
> 
> I am running a ordinal logistic regression, however its giving me an error
> like
> 
> Error: cannot allocate vector of size 58.8 GbIn addition: Warning
> messages:1: In rep.int(c(1, numeric(n)), n - 1L) :
>  Reached total allocation of 8057Mb: see help(memory.size)2: In
> rep.int(c(1, numeric(n)), n - 1L) :
>  Reached total allocation of 8057Mb: see help(memory.size)3: In
> rep.int(c(1, numeric(n)), n - 1L) :
>  Reached total allocation of 8057Mb: see help(memory.size)4: In
> rep.int(c(1, numeric(n)), n - 1L) :
>  Reached total allocation of 8057Mb: see help(memory.size)
> 
> 
> I am using a 64 bit laptop. I ma not sure what is causing this kind of issue
> 
> Regards
> 
> Anukriti Gupta
> Analyst (Financial Crime Compliance), HSBC
> M: +91 88820 45065
> LinkedIn 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about change columns for specific rows

2016-01-30 Thread ruipbarradas
Sorry, there's a mistake, there's a missing comma, it should be

df[df$date >= '2012-01-01'& df$date <= '2013-12-31', ]$A

Rui Barradas
 

Citando ruipbarra...@sapo.pt:

> Hello,
>
> Try
>
> df[df$date >= '2012-01-01'& df$date <= '2013-12-31']$A = etc
>
> Hope this helps,
>
> Rui Barradas
>  
>
> Citando lily li :
>> Hi R users,
>>
>> I have a data frame, and I generate a date column like this:
>> df$date = seq(as.Date('2012-01-01'), as.Date('2014-12-31'))
>>
>> df
>> A  B  C
>> 1  2   1
>> 2  2   3
>> 3  2   4
>>
>> So the data frame has 4 columns now. But when I want to change the values
>> of column A for specific dates, such as 2012-01-01 to 2013-12-31, I use the
>> code below:
>> df[date >= '2012-01-01' <= '2013-12-31']$A =
>> df[date >= '2012-01-01' <= '2013-12-31']$A +2
>>
>> But it does not work, the date I generate seems not effective. What is the
>> problem? Thanks for your help.
>>
>>         [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.htmland provide commented,
>> minimal, self-contained, reproducible code.
>
>  
>
>         [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide  
> http://www.R-project.org/posting-guide.htmland provide commented,  
> minimal, self-contained, reproducible code.

 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about change columns for specific rows

2016-01-30 Thread ruipbarradas
Hello,

Try

df[df$date >= '2012-01-01'& df$date <= '2013-12-31']$A = etc

Hope this helps,

Rui Barradas
 

Citando lily li :

> Hi R users,
>
> I have a data frame, and I generate a date column like this:
> df$date = seq(as.Date('2012-01-01'), as.Date('2014-12-31'))
>
> df
> A  B  C
> 1  2   1
> 2  2   3
> 3  2   4
>
> So the data frame has 4 columns now. But when I want to change the values
> of column A for specific dates, such as 2012-01-01 to 2013-12-31, I use the
> code below:
> df[date >= '2012-01-01' <= '2013-12-31']$A =
> df[date >= '2012-01-01' <= '2013-12-31']$A +2
>
> But it does not work, the date I generate seems not effective. What is the
> problem? Thanks for your help.
>
>         [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide  
> http://www.R-project.org/posting-guide.htmland provide commented,  
> minimal, self-contained, reproducible code.

 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about change columns for specific rows

2016-01-30 Thread David Winsemius

> On Jan 29, 2016, at 10:30 PM, lily li  wrote:
> 
> Hi R users,
> 
> I have a data frame, and I generate a date column like this:
> df$date = seq(as.Date('2012-01-01'), as.Date('2014-12-31'))
> 
> df
> A  B  C
> 1  2   1
> 2  2   3
> 3  2   4
> 
> So the data frame has 4 columns now. But when I want to change the values
> of column A for specific dates, such as 2012-01-01 to 2013-12-31, I use the
> code below:
> df[date >= '2012-01-01' <= '2013-12-31']$A =
> df[date >= '2012-01-01' <= '2013-12-31']$A +2
> 
> But it does not work, the date I generate seems not effective. What is the
> problem? Thanks for your help.
> 

You are using "[" incorrectly. You should be passing that logical vector to the 
"row-position" of the  i,j-form of the `[<-` function. Try instead:

df[date >= '2012-01-01' <= '2013-12-31' , ]$A =
   df[ date >= '2012-01-01' <= '2013-12-31' , ]$A +2

(As it is, I believe you are selecting columns rather than rows.)

>   [[alternative HTML version deleted]]

And study the documentation of your email client so you can learn how to send 
palin text emails to rhelp.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R help

2016-01-30 Thread Anukriti Gupta
Hi

I am running a ordinal logistic regression, however its giving me an error
like

Error: cannot allocate vector of size 58.8 GbIn addition: Warning
messages:1: In rep.int(c(1, numeric(n)), n - 1L) :
  Reached total allocation of 8057Mb: see help(memory.size)2: In
rep.int(c(1, numeric(n)), n - 1L) :
  Reached total allocation of 8057Mb: see help(memory.size)3: In
rep.int(c(1, numeric(n)), n - 1L) :
  Reached total allocation of 8057Mb: see help(memory.size)4: In
rep.int(c(1, numeric(n)), n - 1L) :
  Reached total allocation of 8057Mb: see help(memory.size)


I am using a 64 bit laptop. I ma not sure what is causing this kind of issue

Regards

Anukriti Gupta
Analyst (Financial Crime Compliance), HSBC
M: +91 88820 45065
LinkedIn 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] about change columns for specific rows

2016-01-30 Thread lily li
Hi R users,

I have a data frame, and I generate a date column like this:
df$date = seq(as.Date('2012-01-01'), as.Date('2014-12-31'))

df
A  B  C
1  2   1
2  2   3
3  2   4

So the data frame has 4 columns now. But when I want to change the values
of column A for specific dates, such as 2012-01-01 to 2013-12-31, I use the
code below:
df[date >= '2012-01-01' <= '2013-12-31']$A =
df[date >= '2012-01-01' <= '2013-12-31']$A +2

But it does not work, the date I generate seems not effective. What is the
problem? Thanks for your help.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Use of betatree {betareg} to estimate true positives from multiple tests

2016-01-30 Thread r_1470
 Hi, Doesanyone use the 'betatree' function in the betareg package to do a kind 
of falsediscovery rate (FDR) test for a set of many p values? I wasthinking to 
compare the beta parameters of the true distribution of about 1000p values with 
p values of permuted data, and test whether the two distributionswere 
significantly different, with a greater mass of low p values in the truedata. I 
know this is possible with betatree, I'm just not sure if it's a 
'normal'method. I knowuniform-beta mixture models are commonly used to examine 
FDR, and there are R packagesavailable, but my tests are non-independent so the 
null distribution may not beuniform and has to be found empirically. The 
mixture model approach assumes there are two groups: truepositives (beta 
distribution) and true negatives (uniform distribution). I think my approach 
would assume a continuous distributionof effect sizes in the true data, rather 
than 2 distinct groups, with a point mass at 0 effect size for thepermuted data.
I also looked at using the 'betamix' function, but this is much less accurate 
and takes longer to run. Best Richard.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem displaying greek symbols

2016-01-30 Thread Bert Gunter
(Ill give it a try, but more expertise than I have may be needed)

Works fine for me (on OS X).

Take a look at ?pdf . I believe the font family in use (Helvetica is
the default) needs to have the (Adobe) symbol font as font 5. What
family are you using?

To see what families are available, use:

 names(grDevices::pdfFonts())

Another possibility is that you are using the wrong encoding.
Unfortunately, this is beyond my ability to help you with, but perhaps
reading the Help on the encoding argument and related links might get
you the necessary info.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Jan 30, 2016 at 1:43 PM, Jorge Fernández García
 wrote:
> Hi,
>
>
> I have a problem displaying greek (and in general any special character).
>
>
> I know I am using the right command as the same script works in Fedora20 but 
> not in MAC Yosemite.
>
>
> ylab=expression(delta) displays a square instead of the right symbol when I 
> view the resulting pdf file with preview or any other tool to display pdf.
>
>
> Any idea of what's going on?
>
>
> Thanks in advance
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Metodo lars y na

2016-01-30 Thread Carlos Ortega
Hola,

No, "lars" y toda la familia "lasso", "glmnet" no puede tratar NAs.
Puedes:

   - eliminar los casos con NA (puedes utilizar la función
   "complete.cases()" para ello)
   - o los puedes imputar (completar) con alguno de los múltiples paquetes
   que lo permiten hacer.

Saludos,
Carlos Ortega
www.qualityexcellence.es



El 31 de enero de 2016, 0:02, Gianluca Roncalli 
escribió:

> Hola a todos,
> Estoy intentando utilizar el metodo lasso con la funcion lars en una base
> de datos con varios NA. Cuando ejecuto la funcion lars me sale: error in if
> (any(nosignal)){: missing value where TRUE/FALSE nedeed.
> Sabeis como esta función maneja los NA.
> Espero me podais ayudar.
> Gracias
> Gianluca
>
> [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>



-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] Problem displaying greek symbols

2016-01-30 Thread David Winsemius

> On Jan 30, 2016, at 3:24 PM, Bert Gunter  wrote:
> 
> (Ill give it a try, but more expertise than I have may be needed)
> 
> Works fine for me (on OS X).
> 
> Take a look at ?pdf . I believe the font family in use (Helvetica is
> the default) needs to have the (Adobe) symbol font as font 5. What
> family are you using?
> 
> To see what families are available, use:
> 
> names(grDevices::pdfFonts())

That's not very informative, since the actual fonts that are going to be used 
are inside the 'serif', "sans", and  "mono" families. Try this instead:

> pdfFonts()$serif$metrics
[1] "Times-Roman.afm"  "Times-Bold.afm"   "Times-Italic.afm"
[4] "Times-BoldItalic.afm" "Symbol.afm"  

> pdfFonts()$mono$metrics
[1] "Courier.afm" "Courier-Bold.afm"   
[3] "Courier-Oblique.afm" "Courier-BoldOblique.afm"
[5] "Symbol.afm" 

Notice the the fifth item in both is Symbol.

Which may also not be very useful either since for reasons that I have never 
been able to fathom, the fonts sometimes get messed up on a Mac and the way to 
detect and correct the problem is to use Font Book.app which I think you will 
find in either ~/Applications or ~/Applications/Utilities. The symptom: ... you 
find a font type in Font Book that has duplicate entries. Delete the corrupted 
one and you may find your Symbols will reappear.

(This is documented in ?quartz.)


> Another possibility is that you are using the wrong encoding.
> Unfortunately, this is beyond my ability to help you with, but perhaps
> reading the Help on the encoding argument and related links might get
> you the necessary info.
> 
> Cheers,
> Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Sat, Jan 30, 2016 at 1:43 PM, Jorge Fernández García
>  wrote:
>> Hi,
>> 
>> 
>> I have a problem displaying greek (and in general any special character).
>> 
>> 
>> I know I am using the right command as the same script works in Fedora20 but 
>> not in MAC Yosemite.
>> 
>> 
>> ylab=expression(delta) displays a square instead of the right symbol when I 
>> view the resulting pdf file with preview or any other tool to display pdf.

A full test would be:

pdf(); plot(1,1, main=expression(delta)); dev.off()

-- 
David.

>> 
>> 
>> Any idea of what's going on?
>> 
>> 
>> Thanks in advance
>> 
>> 
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R-es] Metodo lars y na

2016-01-30 Thread Gianluca Roncalli
Hola a todos,
Estoy intentando utilizar el metodo lasso con la funcion lars en una base
de datos con varios NA. Cuando ejecuto la funcion lars me sale: error in if
(any(nosignal)){: missing value where TRUE/FALSE nedeed.
Sabeis como esta función maneja los NA.
Espero me podais ayudar.
Gracias
Gianluca

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-01-30 Thread Gaius Augustus
I'll look into the Intervals idea.  The data.table code posted might not
work (because I don't believe it would put the rows in the correct order if
the chromosomes are interspersed), however, it did make me think about
possibly assigning based on values...

Something like:
mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position =
c(3000, 6000, 1000), key = "Chr")
Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End
= c(5000, 1), key = "Chr")

for(i in 1:nrow(Chr.Arms)){
  cur.row <- Chr.Arms[i, ]
  mapfile[ Chr == cur.row$Chr & Position >= cur.row$Start & Position <=
cur.row$End] <- Chr.Arms$Arm
}

This might take out the need for the intermediate table/vector.  Not sure
yet if it'll work, but we'll see.  I'm interested to know if anyone else
has any ideas, too.

Thanks,
Gaius

On Fri, Jan 29, 2016 at 11:34 PM, Ulrik Stervbo 
wrote:

> Hi Gaius,
>
> Could you use data.table and loop over the small Chr.arms?
>
> library(data.table)
> mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position =
> c(3000, 6000, 1000), key = "Chr")
> Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End
> = c(5000, 1), key = "Chr")
>
> Arms <- data.table()
> for(i in 1:nrow(Chr.Arms)){
>   cur.row <- Chr.Arms[i, ]
>   Arm <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End]
>   Arm <- Arm[ , Arm:=cur.row$Arm][]
>   Arms <- rbind(Arms, Arm)
> }
>
> # Or use plyr to loop over each possible arm
> library(plyr)
> Arms <- ddply(Chr.Arms, .variables = "Arm", function(cur.row, mapfile){
>   mapfile <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End]
>   mapfile <- mapfile[ , Arm:=cur.row$Arm][]
>   return(mapfile)
> }, mapfile = mapfile)
>
> I have just started to use the data.table and I have the feeling the code
> above can be greatly improved - maybe the loop can be dropped entirely?
>
> Hope this helps
> Ulrik
>
> On Sat, 30 Jan 2016 at 03:29 Gaius Augustus 
> wrote:
>
>> I have two dataframes. One has chromosome arm information, and the other
>> has SNP position information. I am trying to assign each SNP an arm
>> identity.  I'd like to create this new column based on comparing it to the
>> reference file.
>>
>> *1) Mapfile (has millions of rows)*
>>
>> NameChr   Position
>> S1  1  3000
>> S2  1  6000
>> S3  1  1000
>>
>> *2) Chr.Arms   file (has 39 rows)*
>>
>> ChrArmStart   End
>> 1  p  0   5000
>> 1  q  50011
>>
>>
>> *R Script that works, but slow:*
>> Arms  <- c()
>> for (line in 1:nrow(Mapfile)){
>>   Arms[line] <- Chr.Arms$Arm[ Mapfile$Chr[line] == Chr.Arms$Chr &
>>  Mapfile$Position[line] > Chr.Arms$Start &  Mapfile$Position[line] <
>> Chr.Arms$End]}
>> }
>> Mapfile$Arm <- Arms
>>
>>
>> *Output Table:*
>>
>> Name   Chr   Position   Arm
>> S1  1 3000  p
>> S2  1 6000  q
>> S3  1 1000  p
>>
>>
>> In words: I want each line to look up the location ( 1) find the right
>> Chr,
>> 2) find the line where the START < POSITION < END), then get the ARM
>> information and place it in a new column.
>>
>> This R script works, but surely there is a more time/processing efficient
>> way to do it.
>>
>> Thanks in advance for any help,
>> Gaius
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-01-30 Thread Gaius Augustus
I'll look into the Intervals idea.  The data.table code posted might not
work (because I don't believe it would put the rows in the correct order if
the chromosomes are interspersed), however, it did make me think about
possibly assigning based on values...

*SOLUTION*
mapfile <- data.frame(Name = c("S1", "S2", "S3"), Chr = 1, Position =
c(3000, 6000, 1000), key = "Chr")
Chr.Arms <- data.frame(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End
= c(5000, 1), key = "Chr")

for(i in 1:nrow(Chr.Arms)){
  cur.row <- Chr.Arms[i, ]
  mapfile$Arm[ mapfile$Chr == cur.row$Chr & mapfile$Position >=
cur.row$Start & mapfile$Position <= cur.row$End] <- cur.row$Arm
}

This took out the need for the intermediate table/vector.  This worked for
me, and was VERY fast.  Took <5 minutes on a dataframe with 35 million rows.

Thanks for the help,
Gaius

On Sat, Jan 30, 2016 at 10:50 AM, Gaius Augustus 
wrote:

> I'll look into the Intervals idea.  The data.table code posted might not
> work (because I don't believe it would put the rows in the correct order if
> the chromosomes are interspersed), however, it did make me think about
> possibly assigning based on values...
>
> Something like:
> mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position =
> c(3000, 6000, 1000), key = "Chr")
> Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End
> = c(5000, 1), key = "Chr")
>
> for(i in 1:nrow(Chr.Arms)){
>   cur.row <- Chr.Arms[i, ]
>   mapfile[ Chr == cur.row$Chr & Position >= cur.row$Start & Position <=
> cur.row$End] <- Chr.Arms$Arm
> }
>
> This might take out the need for the intermediate table/vector.  Not sure
> yet if it'll work, but we'll see.  I'm interested to know if anyone else
> has any ideas, too.
>
> Thanks,
> Gaius
>
> On Fri, Jan 29, 2016 at 11:34 PM, Ulrik Stervbo 
> wrote:
>
>> Hi Gaius,
>>
>> Could you use data.table and loop over the small Chr.arms?
>>
>> library(data.table)
>> mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position =
>> c(3000, 6000, 1000), key = "Chr")
>> Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001),
>> End = c(5000, 1), key = "Chr")
>>
>> Arms <- data.table()
>> for(i in 1:nrow(Chr.Arms)){
>>   cur.row <- Chr.Arms[i, ]
>>   Arm <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End]
>>   Arm <- Arm[ , Arm:=cur.row$Arm][]
>>   Arms <- rbind(Arms, Arm)
>> }
>>
>> # Or use plyr to loop over each possible arm
>> library(plyr)
>> Arms <- ddply(Chr.Arms, .variables = "Arm", function(cur.row, mapfile){
>>   mapfile <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End]
>>   mapfile <- mapfile[ , Arm:=cur.row$Arm][]
>>   return(mapfile)
>> }, mapfile = mapfile)
>>
>> I have just started to use the data.table and I have the feeling the code
>> above can be greatly improved - maybe the loop can be dropped entirely?
>>
>> Hope this helps
>> Ulrik
>>
>> On Sat, 30 Jan 2016 at 03:29 Gaius Augustus 
>> wrote:
>>
>>> I have two dataframes. One has chromosome arm information, and the other
>>> has SNP position information. I am trying to assign each SNP an arm
>>> identity.  I'd like to create this new column based on comparing it to
>>> the
>>> reference file.
>>>
>>> *1) Mapfile (has millions of rows)*
>>>
>>> NameChr   Position
>>> S1  1  3000
>>> S2  1  6000
>>> S3  1  1000
>>>
>>> *2) Chr.Arms   file (has 39 rows)*
>>>
>>> ChrArmStart   End
>>> 1  p  0   5000
>>> 1  q  50011
>>>
>>>
>>> *R Script that works, but slow:*
>>> Arms  <- c()
>>> for (line in 1:nrow(Mapfile)){
>>>   Arms[line] <- Chr.Arms$Arm[ Mapfile$Chr[line] == Chr.Arms$Chr &
>>>  Mapfile$Position[line] > Chr.Arms$Start &  Mapfile$Position[line] <
>>> Chr.Arms$End]}
>>> }
>>> Mapfile$Arm <- Arms
>>>
>>>
>>> *Output Table:*
>>>
>>> Name   Chr   Position   Arm
>>> S1  1 3000  p
>>> S2  1 6000  q
>>> S3  1 1000  p
>>>
>>>
>>> In words: I want each line to look up the location ( 1) find the right
>>> Chr,
>>> 2) find the line where the START < POSITION < END), then get the ARM
>>> information and place it in a new column.
>>>
>>> This R script works, but surely there is a more time/processing efficient
>>> way to do it.
>>>
>>> Thanks in advance for any help,
>>> Gaius
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do