Re: [R] Plotting Dates Time Series Data

2016-01-31 Thread Jeff Newmiller
Try ?xts or ?zoo packages instead of ts.
-- 
Sent from my phone. Please excuse my brevity.

On January 31, 2016 3:59:30 PM PST, Jeff Reichman  
wrote:
>R-Help Users
>
> 
>
>How do I plot "Dates" on the x-axis of a TS plot 
>
> 
>
>>
>mydata<-c(575,125,950,5020,2515,565,135,945,5100,2510,580,140,955,5045,2505,
>570,135,1000,5005,2520,580,130,925,5000,2525,585,120,960,5025,2520)
>
>> myts<-ts(mydata,start=as.Date("2015-01-01"))
>
>> myts
>
>Time Series:
>
>Start = 16436 
>
>End = 16465 
>
>Frequency = 1 
>
>[1]  575  125  950 5020 2515  565  135  945 5100 2510  580  140  955
>5045
>2505
>
>[16]  570  135 1000 5005 2520  580  130  925 5000 2525  585  120  960
>5025
>2520
>
>> plot.ts(myts)
>
> 
>
>I'm assuming I have to reformat my start date values.  I just assume R
>would
>do that.
>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting Dates Time Series Data

2016-01-31 Thread Jeff Reichman
R-Help Users

 

How do I plot "Dates" on the x-axis of a TS plot 

 

>
mydata<-c(575,125,950,5020,2515,565,135,945,5100,2510,580,140,955,5045,2505,
570,135,1000,5005,2520,580,130,925,5000,2525,585,120,960,5025,2520)

> myts<-ts(mydata,start=as.Date("2015-01-01"))

> myts

Time Series:

Start = 16436 

End = 16465 

Frequency = 1 

 [1]  575  125  950 5020 2515  565  135  945 5100 2510  580  140  955 5045
2505

[16]  570  135 1000 5005 2520  580  130  925 5000 2525  585  120  960 5025
2520

> plot.ts(myts)

 

I'm assuming I have to reformat my start date values.  I just assume R would
do that.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-01-31 Thread Gaius Augustus
Thanks Denes,
I should have thought of foverlaps as an option.  I wonder how fast it is
compared to my solution!

My particular solution does not need data.table in order to work.  It just
loops through the ChrArms (Chromosome Arms, which always has 39 rows) and
assigns the proper arm to all rows within mapfile that lie within Start and
End on a particular Chr.  This is opposed to my first solution, where I was
trying to loop through mapfile (which could be millions of rows) and assign
each row one at a time.  That's why I used data.frame.

For some reason, yesterday, data.table was acting funny on the computer I
remote to, so I need to figure out why that is once I can get on it.  Then
I want to time my solution and one with foverlaps to see if one is faster.

Thanks,
Gaius

On Sun, Jan 31, 2016 at 2:17 AM, Dénes Tóth  wrote:

> Hi,
>
> I have not followed this thread from the beginning, but have you tried the
> foverlaps() function from the data.table package?
>
> Something along the lines of:
>
> ---
> # create the tables (use as.data.table() or setDT() if you
> # start with a data.frame)
> mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1,
>   Position = c(3000, 6000, 1000))
> Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"),
>Start = c(0, 5001), End = c(5000, 1))
>
> # add a dummy variable to be able to define Position as an interval
> mapfile[, Position2 := Position]
>
> # add keys
> setkey(mapfile, Chr, Position, Position2)
> setkey(Chr.Arms, Chr, Start, End)
>
> # use data.table::foverlaps (see ?foverlaps)
> mapfile <- foverlaps(mapfile, Chr.Arms, type = "within")
>
> # remove the dummy variable
> mapfile[, Position2 := NULL]
>
> # recreate original order
> setorder(mapfile, Chr, Name)
>
> ---
>
> BTW, there is a typo in your *SOLUTION*. I guess you wanted to write
> data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position = c(3000, 6000,
> 1000), key = "Chr") instead of data.frame(Name = c("S1", "S2", "S3"), Chr =
> 1, Position = c(3000, 6000, 1000), key = "Chr").
>
> HTH,
>   Denes
>
>
>
> On 01/30/2016 07:48 PM, Gaius Augustus wrote:
>
>> I'll look into the Intervals idea.  The data.table code posted might not
>> work (because I don't believe it would put the rows in the correct order
>> if
>> the chromosomes are interspersed), however, it did make me think about
>> possibly assigning based on values...
>>
>> *SOLUTION*
>>
>> mapfile <- data.frame(Name = c("S1", "S2", "S3"), Chr = 1, Position =
>> c(3000, 6000, 1000), key = "Chr")
>> Chr.Arms <- data.frame(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End
>> = c(5000, 1), key = "Chr")
>>
>> for(i in 1:nrow(Chr.Arms)){
>>cur.row <- Chr.Arms[i, ]
>>mapfile$Arm[ mapfile$Chr == cur.row$Chr & mapfile$Position >=
>> cur.row$Start & mapfile$Position <= cur.row$End] <- cur.row$Arm
>> }
>>
>> This took out the need for the intermediate table/vector.  This worked for
>> me, and was VERY fast.  Took <5 minutes on a dataframe with 35 million
>> rows.
>>
>> Thanks for the help,
>> Gaius
>>
>> On Sat, Jan 30, 2016 at 10:50 AM, Gaius Augustus <
>> gaiusjaugus...@gmail.com>
>> wrote:
>>
>> I'll look into the Intervals idea.  The data.table code posted might not
>>> work (because I don't believe it would put the rows in the correct order
>>> if
>>> the chromosomes are interspersed), however, it did make me think about
>>> possibly assigning based on values...
>>>
>>> Something like:
>>> mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position =
>>> c(3000, 6000, 1000), key = "Chr")
>>> Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001),
>>> End
>>> = c(5000, 1), key = "Chr")
>>>
>>> for(i in 1:nrow(Chr.Arms)){
>>>cur.row <- Chr.Arms[i, ]
>>>mapfile[ Chr == cur.row$Chr & Position >= cur.row$Start & Position <=
>>> cur.row$End] <- Chr.Arms$Arm
>>> }
>>>
>>> This might take out the need for the intermediate table/vector.  Not sure
>>> yet if it'll work, but we'll see.  I'm interested to know if anyone else
>>> has any ideas, too.
>>>
>>> Thanks,
>>> Gaius
>>>
>>> On Fri, Jan 29, 2016 at 11:34 PM, Ulrik Stervbo >> >
>>> wrote:
>>>
>>> Hi Gaius,

 Could you use data.table and loop over the small Chr.arms?

 library(data.table)
 mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position =
 c(3000, 6000, 1000), key = "Chr")
 Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001),
 End = c(5000, 1), key = "Chr")

 Arms <- data.table()
 for(i in 1:nrow(Chr.Arms)){
cur.row <- Chr.Arms[i, ]
Arm <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End]
Arm <- Arm[ , Arm:=cur.row$Arm][]
Arms <- rbind(Arms, Arm)
 }

 # Or use plyr to loop over each possible arm
 library(plyr)
 Arms <- ddply(Chr.Arms, .variables = "Arm", function(cur.row, mapfile){
mapfile <- mapfile[ Position >= cur.row$Start & Position <=
 cur.row$End]

Re: [R] Time Series and Auto.arima

2016-01-31 Thread Lorenzo Isella

Partially the trouble is that the zoo time series is then translated
into a ts object by auto.arima.
In doing so, the series along a regular time grid and some missing
data appear.
To fix this, I should replace each NA with the previous non-NA value.
This is easy enough and the series exhibits some clear cycles: roughly
every month there is a spike, followed by a decrease, then another
spike and so on.
I would like to forecast a couple of cycles (60 steps), but when I do
so with auto.arima, nothing like what I expect appears (the
seasonality is completely lost).
Any idea why?
I paste below the revised R code for reproducibility.

Lorenzo





library(forecast)

tt<-structure(c(1494.5, 1367.57, 1357.57, 1222.23, 1124.02, 1011.64,
4575.64, 3201.87, 3050.04, 2173.38, 1967.88, 1838.55, 1666.05,
1656.05, 1524.96, 835.96, 775.36, 592.36, 494.15, 4058.15, 2624.36,
2448.47, 1598.47, 1398.47, 1264.14, 1165.88, 1053.67, 941.36,
821.36, 471.36, 373.15, 259.91, 3808.91, 2262.26, 1940.39, 1011.39,
800.81, 790.81), index = structure(c(16563L, 16565L, 16570L,
16572L, 16577L, 16579L, 16584L, 16585L, 16586L, 16587L, 16588L,
16589L, 16590L, 16592L, 16593L, 16599L, 16606L, 16607L, 16608L,
16612L, 16613L, 16614L, 16617L, 16618L, 16619L, 16620L, 16621L,
16628L, 16633L, 16635L, 16638L, 16642L, 16647L, 16648L, 16649L,
16650L, 16651L, 16654L), class = "Date"), class = "zoo")

tt2<-as.ts(tt)
tt2<-na.locf(tt2)

mm<-auto.arima(tt2)


plot(forecast(mm, h=60))




On Fri, Jan 29, 2016 at 02:16:27PM -0800, David Winsemius wrote:



On Jan 29, 2016, at 12:59 PM, Lorenzo Isella  wrote:

Dear All,
I am puzzled and probably I am misunderstanding something.
Please consider the snippet at the end of the email.
We see a time series that has clearly some pattern (essentially, it is
an account where a salary is regularly paid followed by some
expenses).
However the output of the auto.arima from the forecast function does
not seem to make any sense (at least to me).
I wonder if the problem is the fact that the time series is not
defined at regular intervals.
Any suggestions and alternative ways to fit it (e.g.: sarima from the astsa
package to account for the seasonality?) are really welcome.
Many thanks

Lorenzo



##
library(forecast)

tt<-structure(c(1494.5, 1367.57, 1357.57, 1222.23, 1124.02, 1011.64,
4575.64, 3201.87, 3050.04, 2173.38, 1967.88, 1838.55, 1666.05,
1656.05, 1524.96, 835.96, 775.36, 592.36, 494.15, 4058.15, 2624.36,
2448.47, 1598.47, 1398.47, 1264.14, 1165.88, 1053.67, 941.36,
821.36, 471.36, 373.15, 259.91, 3808.91, 2262.26, 1940.39, 1011.39,
800.81, 790.81), index = structure(c(16563L, 16565L, 16570L,
16572L, 16577L, 16579L, 16584L, 16585L, 16586L, 16587L, 16588L,
16589L, 16590L, 16592L, 16593L, 16599L, 16606L, 16607L, 16608L,
16612L, 16613L, 16614L, 16617L, 16618L, 16619L, 16620L, 16621L,
16628L, 16633L, 16635L, 16638L, 16642L, 16647L, 16648L, 16649L,
16650L, 16651L, 16654L), class = "Date"), class = "zoo")

plot(tt)



library(forecast)


fit<-auto.arima(tt)

###


If , after runing plot(tt), you then run:

fitted(fit)

Time Series:
Start = 16563
End = 16654
Frequency = 1
[1] 1448.8211NA 1444.8612NANANANA
[8] 1398.7752NA 1359.0350NANANANA
[15] 1309.1398NA 1219.7420NANANANA
[22] 2302.8903 3708.1762 2713.0349 2603.0512 1968.0100 1819.1484 1725.4634
[29]NA 1572.6179 1593.2628NANANANA
[36]NA 1258.3403NANANANANA
[43]NA 1184.9656  955.3023  822.7394NANANA
[50] 1987.7634 .3131 2294.6941NANA 1760.6351 1551.5526
[57] 1406.6751 1309.3682 1238.1899NANANANA
[64]NANA 1251.6898NANANANA
[71] 1179.9970NA  988.3885NANA  888.4533NA
[78]NANA  889.4017NANANANA
[85] 1970.0911 3152.7668 2032.3935 1799.2350 1126.2794NANA
[92] 1088.1525


Using that vector:

lines(seq(16563 ,16654 ),fitted(fit), col="red", lwd=3)

You can see that the fitted values are capturing quite a bit of the variation.



I'm not a regular user of pkg:forecast, so there may be more refined methods of 
extracting information than using `fitted`.

--

David Winsemius
Alameda, CA, USA



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Modelling non-Negative Time Series

2016-01-31 Thread Lorenzo Isella

Dear All,
I am struggling to develop a model to forecast the daily expenses from
a bank account.
The daily time series consists (obviously) of non-negative numbers
which can be zero in the days when no money is taken from the bank
account.
To give you an idea of the kind of series I am dealing with, please
have a look at

myts<-structure(c(5.5, 0, 126.93, 0, 0, 0, 0, 10, 0, 135.34, 0, 0,
0, 0, 98.21, 0, 112.38, 0, 0, 0, 0, 0, 1373.77, 151.83, 26.66,
205.5, 129.33, 172.5, 0, 10, 131.09, 0, 0, 0, 0, 0, 689, 0, 0,
0, 0, 0, 0, 60.6, 183, 98.21, 0, 0, 0, 0, 1433.79, 175.89, 0,
0, 0, 200, 134.33, 98.26, 112.21, 0, 0, 0, 0, 0, 0, 112.31, 0,
0, 0, 0, 120, 0, 350, 0, 0, 98.21, 0, 0, 0, 113.24, 0, 0, 0,
0, 15, 696.65, 321.87, 929, 210.58, 0, 0, 10), .Tsp = c(16563,
16654, 1), class = "ts")

(the time origin is a bit funny, but what matters is that I have daily
data).

Do you know any R package to handle this kind of series? I think I am
outside the domain of the ARIMA approach.
I experimented with acp and tscount (to see if I could treat the
series as an autoregressive Poisson series), but I did not get very
far.
Any suggestion is appreciated.
Cheers

Lorenzo

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R help

2016-01-31 Thread കുഞ്ഞായി kunjaai
Hai Anukriti Gupta,

While sending mail to mailing list, please change the subject  from "R
Help" to more specific one..(eg: R Regression error..)  Because we all can
refer your mail and the solution in future  by checking the mail subject




On Sat, Jan 30, 2016 at 9:55 PM, Boris Steipe 
 wrote:

> I think the error message is pretty clear. Your calculations are
> attempting to allocate more memory than you have available. As to what is
> causing your code to do this, only someone familiar with your code could
> possibly tell.
>
> B.
> (Read the posting guide, please - and don't post in HTML :-)
>
>
> On Jan 30, 2016, at 1:44 AM, Anukriti Gupta 
> wrote:
>
> > Hi
> >
> > I am running a ordinal logistic regression, however its giving me an
> error
> > like
> >
> > Error: cannot allocate vector of size 58.8 GbIn addition: Warning
> > messages:1: In rep.int(c(1, numeric(n)), n - 1L) :
> >  Reached total allocation of 8057Mb: see help(memory.size)2: In
> > rep.int(c(1, numeric(n)), n - 1L) :
> >  Reached total allocation of 8057Mb: see help(memory.size)3: In
> > rep.int(c(1, numeric(n)), n - 1L) :
> >  Reached total allocation of 8057Mb: see help(memory.size)4: In
> > rep.int(c(1, numeric(n)), n - 1L) :
> >  Reached total allocation of 8057Mb: see help(memory.size)
> >
> >
> > I am using a 64 bit laptop. I ma not sure what is causing this kind of
> issue
> >
> > Regards
> >
> > Anukriti Gupta
> > Analyst (Financial Crime Compliance), HSBC
> > M: +91 88820 45065
> > LinkedIn 
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> 
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> 
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
DILEEPKUMAR. R
J R F, IIT DELHI

On Sat, Jan 30, 2016 at 9:55 PM, Boris Steipe 
wrote:

> I think the error message is pretty clear. Your calculations are
> attempting to allocate more memory than you have available. As to what is
> causing your code to do this, only someone familiar with your code could
> possibly tell.
>
> B.
> (Read the posting guide, please - and don't post in HTML :-)
>
>
> On Jan 30, 2016, at 1:44 AM, Anukriti Gupta 
> wrote:
>
> > Hi
> >
> > I am running a ordinal logistic regression, however its giving me an
> error
> > like
> >
> > Error: cannot allocate vector of size 58.8 GbIn addition: Warning
> > messages:1: In rep.int(c(1, numeric(n)), n - 1L) :
> >  Reached total allocation of 8057Mb: see help(memory.size)2: In
> > rep.int(c(1, numeric(n)), n - 1L) :
> >  Reached total allocation of 8057Mb: see help(memory.size)3: In
> > rep.int(c(1, numeric(n)), n - 1L) :
> >  Reached total allocation of 8057Mb: see help(memory.size)4: In
> > rep.int(c(1, numeric(n)), n - 1L) :
> >  Reached total allocation of 8057Mb: see help(memory.size)
> >
> >
> > I am using a 64 bit laptop. I ma not sure what is causing this kind of
> issue
> >
> > Regards
> >
> > Anukriti Gupta
> > Analyst (Financial Crime Compliance), HSBC
> > M: +91 88820 45065
> > LinkedIn 
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
DILEEPKUMAR. R
J R F, IIT DELHI

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Studio error while installing twitteR package

2016-01-31 Thread Archit Soni
​Ya Duncan, but I searched bit more ​and got the solution to remap the
working directory and if the issue still persists then we change the CRAN
mirror. Now it is working fine.

Thanks,
Archit

On Sun, Jan 31, 2016 at 5:40 PM, Duncan Murdoch 
wrote:

> On 31/01/2016 5:56 AM, Archit Soni wrote:
>
>> Hi All,
>>
>> I am getting below error while installing package "twitteR" , but it gets
>> successfuly installed via R console, any ideas ?
>>
>> *Error:*
>>
>> *install.packages("twitteR", lib="C:/Program
>> Files/TIBCO/terrde40/site-library")Trying to download URL
>> 'https://cran.rstudio.com/bin/windows/contrib/3.2/twitteR_1.1.9.zip
>> ' to
>> file
>>
>> 'C:/Users/Archit/AppData/Local/Temp/TERR_1ae800291/downloaded_packages/twitteR_1.1.9.zip'
>> Downloaded 446573 bytes* installing *binary* package twitteR from
>>
>> "C:\\Users\\Archit\\AppData\\Local\\Temp\\TERR_1ae800291\\downloaded_packages\\twitteR_1.1.9.zip"
>> to "C:/Program Files/TIBCO/terrde40/site-library"* checking MD5
>> checksumsPackage "twitteR" at directory "C:/Program
>> Files/TIBCO/terrde40/site-library/twitteR" does not have an MD5 file, so
>> integrity check was not doneCOULD NOT CHECK MD5 CHECKSUMS*
>>
>>
> I think you'll need to contact either RStudio or Tibco support for this.
>
> Duncan Murdoch
>



-- 
Regards
Archit

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Studio error while installing twitteR package

2016-01-31 Thread Duncan Murdoch

On 31/01/2016 5:56 AM, Archit Soni wrote:

Hi All,

I am getting below error while installing package "twitteR" , but it gets
successfuly installed via R console, any ideas ?

*Error:*

*install.packages("twitteR", lib="C:/Program
Files/TIBCO/terrde40/site-library")Trying to download URL
'https://cran.rstudio.com/bin/windows/contrib/3.2/twitteR_1.1.9.zip
' to
file
'C:/Users/Archit/AppData/Local/Temp/TERR_1ae800291/downloaded_packages/twitteR_1.1.9.zip'
Downloaded 446573 bytes* installing *binary* package twitteR from
"C:\\Users\\Archit\\AppData\\Local\\Temp\\TERR_1ae800291\\downloaded_packages\\twitteR_1.1.9.zip"
to "C:/Program Files/TIBCO/terrde40/site-library"* checking MD5
checksumsPackage "twitteR" at directory "C:/Program
Files/TIBCO/terrde40/site-library/twitteR" does not have an MD5 file, so
integrity check was not doneCOULD NOT CHECK MD5 CHECKSUMS*



I think you'll need to contact either RStudio or Tibco support for this.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R Studio error while installing twitteR package

2016-01-31 Thread Archit Soni
Hi All,

I am getting below error while installing package "twitteR" , but it gets
successfuly installed via R console, any ideas ?

*Error:*

*install.packages("twitteR", lib="C:/Program
Files/TIBCO/terrde40/site-library")Trying to download URL
'https://cran.rstudio.com/bin/windows/contrib/3.2/twitteR_1.1.9.zip
' to
file
'C:/Users/Archit/AppData/Local/Temp/TERR_1ae800291/downloaded_packages/twitteR_1.1.9.zip'
Downloaded 446573 bytes* installing *binary* package twitteR from
"C:\\Users\\Archit\\AppData\\Local\\Temp\\TERR_1ae800291\\downloaded_packages\\twitteR_1.1.9.zip"
to "C:/Program Files/TIBCO/terrde40/site-library"* checking MD5
checksumsPackage "twitteR" at directory "C:/Program
Files/TIBCO/terrde40/site-library/twitteR" does not have an MD5 file, so
integrity check was not doneCOULD NOT CHECK MD5 CHECKSUMS*

-- 
Regards
Archit

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-01-31 Thread Dénes Tóth

Hi,

I have not followed this thread from the beginning, but have you tried 
the foverlaps() function from the data.table package?


Something along the lines of:

---
# create the tables (use as.data.table() or setDT() if you
# start with a data.frame)
mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1,
  Position = c(3000, 6000, 1000))
Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"),
   Start = c(0, 5001), End = c(5000, 1))

# add a dummy variable to be able to define Position as an interval
mapfile[, Position2 := Position]

# add keys
setkey(mapfile, Chr, Position, Position2)
setkey(Chr.Arms, Chr, Start, End)

# use data.table::foverlaps (see ?foverlaps)
mapfile <- foverlaps(mapfile, Chr.Arms, type = "within")

# remove the dummy variable
mapfile[, Position2 := NULL]

# recreate original order
setorder(mapfile, Chr, Name)

---

BTW, there is a typo in your *SOLUTION*. I guess you wanted to write 
data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position = c(3000, 6000, 
1000), key = "Chr") instead of data.frame(Name = c("S1", "S2", "S3"), 
Chr = 1, Position = c(3000, 6000, 1000), key = "Chr").


HTH,
  Denes



On 01/30/2016 07:48 PM, Gaius Augustus wrote:

I'll look into the Intervals idea.  The data.table code posted might not
work (because I don't believe it would put the rows in the correct order if
the chromosomes are interspersed), however, it did make me think about
possibly assigning based on values...

*SOLUTION*
mapfile <- data.frame(Name = c("S1", "S2", "S3"), Chr = 1, Position =
c(3000, 6000, 1000), key = "Chr")
Chr.Arms <- data.frame(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End
= c(5000, 1), key = "Chr")

for(i in 1:nrow(Chr.Arms)){
   cur.row <- Chr.Arms[i, ]
   mapfile$Arm[ mapfile$Chr == cur.row$Chr & mapfile$Position >=
cur.row$Start & mapfile$Position <= cur.row$End] <- cur.row$Arm
}

This took out the need for the intermediate table/vector.  This worked for
me, and was VERY fast.  Took <5 minutes on a dataframe with 35 million rows.

Thanks for the help,
Gaius

On Sat, Jan 30, 2016 at 10:50 AM, Gaius Augustus 
wrote:


I'll look into the Intervals idea.  The data.table code posted might not
work (because I don't believe it would put the rows in the correct order if
the chromosomes are interspersed), however, it did make me think about
possibly assigning based on values...

Something like:
mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position =
c(3000, 6000, 1000), key = "Chr")
Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End
= c(5000, 1), key = "Chr")

for(i in 1:nrow(Chr.Arms)){
   cur.row <- Chr.Arms[i, ]
   mapfile[ Chr == cur.row$Chr & Position >= cur.row$Start & Position <=
cur.row$End] <- Chr.Arms$Arm
}

This might take out the need for the intermediate table/vector.  Not sure
yet if it'll work, but we'll see.  I'm interested to know if anyone else
has any ideas, too.

Thanks,
Gaius

On Fri, Jan 29, 2016 at 11:34 PM, Ulrik Stervbo 
wrote:


Hi Gaius,

Could you use data.table and loop over the small Chr.arms?

library(data.table)
mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position =
c(3000, 6000, 1000), key = "Chr")
Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001),
End = c(5000, 1), key = "Chr")

Arms <- data.table()
for(i in 1:nrow(Chr.Arms)){
   cur.row <- Chr.Arms[i, ]
   Arm <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End]
   Arm <- Arm[ , Arm:=cur.row$Arm][]
   Arms <- rbind(Arms, Arm)
}

# Or use plyr to loop over each possible arm
library(plyr)
Arms <- ddply(Chr.Arms, .variables = "Arm", function(cur.row, mapfile){
   mapfile <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End]
   mapfile <- mapfile[ , Arm:=cur.row$Arm][]
   return(mapfile)
}, mapfile = mapfile)

I have just started to use the data.table and I have the feeling the code
above can be greatly improved - maybe the loop can be dropped entirely?

Hope this helps
Ulrik

On Sat, 30 Jan 2016 at 03:29 Gaius Augustus 
wrote:


I have two dataframes. One has chromosome arm information, and the other
has SNP position information. I am trying to assign each SNP an arm
identity.  I'd like to create this new column based on comparing it to
the
reference file.

*1) Mapfile (has millions of rows)*

NameChr   Position
S1  1  3000
S2  1  6000
S3  1  1000

*2) Chr.Arms   file (has 39 rows)*

ChrArmStart   End
1  p  0   5000
1  q  50011


*R Script that works, but slow:*
Arms  <- c()
for (line in 1:nrow(Mapfile)){
   Arms[line] <- Chr.Arms$Arm[ Mapfile$Chr[line] == Chr.Arms$Chr &
  Mapfile$Position[line] > Chr.Arms$Start &  Mapfile$Position[line] <
Chr.Arms$End]}
}
Mapfile$Arm <- Arms


*Output Table:*

Name   Chr   Position   Arm
S1  1 3000  p
S2  1 6000  q
S3  1 1000  p


In words: I want each line to look up the location ( 1) find the right
Chr,
2)