The user wrote in their first post :
I have a lot of observations in my dataset
Heres one way to do it with a data.table :
a=data.table(a)
ans = a[ , list(dt=dt[dt-min(dt)7]) , by=var1,var2,var3]
class(ans$dt) = Date
Timings are below comparing the 3 methods. In this
Note that in the documentaton ?[.data.table where I say that 'by' is slow,
I mean relative to how fast it could be. Its seems, in this specific
example anyway, and with the code posted so far, to be significantly faster
than sqldf and plyr.
Of course the best of both worlds would be to use
Sounds like a good idea. Would it be possible to give an example of how to
combine plyr with data.table, and why that is better than a data.table only
solution ?
hadley wickham h.wick...@gmail.com wrote in message
news:f8e6ff051001200624r2175e38xf558dc8fa3fb6...@mail.gmail.com...
Note that in
On Wed, Jan 20, 2010 at 8:43 AM, Matthew Dowle mdo...@mdowle.plus.com wrote:
Sounds like a good idea. Would it be possible to give an example of how to
combine plyr with data.table, and why that is better than a data.table only
solution ?
Well, ideally, you'd do:
adt - data.table(a)
ans2 -
I see now, thanks for explaining that. Would it be for you to add data.table
methods to ddply then, for this to happen? Or does a ddply method need to
be added to data.table?
hadley wickham h.wick...@gmail.com wrote in message
On Mon, Jan 18, 2010 at 1:54 PM, Bert Gunter gunter.ber...@gene.com wrote:
One way to do it:
1. Convert your date column to the Date class using the as.Date() function.
This allows you to do the necessary arithmetic on the dates below.
dt - as.Date(a[,4],%d/%m/%Y)
2. Create a factor out of
Using data frame, a, from the post below this is how it would be done
in SQL using sqldf. We join together the original table, a, with a
table of minimums (computed by the nested select) and then choose only
the rows where dt - mindt 7 (in the where clause).
library(sqldf)
sqldf(select var1,
Hello,
See my problem below.
a-data.frame(c(s,c,c,n,n,n),c(rep(1,3),rep(2,3)),c(rep(2,3),rep(1,3)),c(01/01/1999,10/02/2000,13/02/2000,11/02/2000,15/02/2000,23/02/2000))
colnames(a)-c(var1,var2,var3,var4)
a
var1 var2 var3 var4
1s1201/01/1999
2c1210/02/2000
Sent: Monday, January 18, 2010 10:40 AM
To: r-help@r-project.org
Subject: [R] problem of data manipulation
Hello,
See my problem below.
a-data.frame(c(s,c,c,n,n,n),c(rep(1,3),rep(2,3)),c(rep(2,3),rep
(1,3)),c(01/01/1999,10/02/2000,13/02/2000,11/02/2000,15/02/2000,2
3/02/2000))
colnames(a)-c(var1
-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter
Sent: Monday, January 18, 2010 11:54 AM
To: 'rusers.sh'; r-help@r-project.org
Subject: Re: [R] problem of data manipulation
One way to do it:
1. Convert your
:15 PM
To: Bert Gunter; rusers.sh; r-help@r-project.org
Subject: Re: [R] problem of data manipulation
-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter
Sent: Monday, January 18, 2010 11:54 AM
To: 'rusers.sh'; r-help@r
-Original Message-
From: Bert Gunter [mailto:gunter.ber...@gene.com]
Sent: Monday, January 18, 2010 12:32 PM
To: William Dunlap; 'rusers.sh'; r-help@r-project.org
Subject: RE: [R] problem of data manipulation
Absolutely... so long as you assume the dates are in order
Sent: Monday, January 18, 2010 12:15 PM
To: Bert Gunter; rusers.sh; r-help@r-project.org
Subject: Re: [R] problem of data manipulation
-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter
Sent: Monday
...@tibco.com
-Original Message-
From: Bert Gunter [mailto:gunter.ber...@gene.com]
Sent: Monday, January 18, 2010 12:32 PM
To: William Dunlap; 'rusers.sh'; r-help@r-project.org
Subject: RE: [R] problem of data manipulation
Absolutely... so long as you assume the dates are in order
Gunter; r-help@r-project.org
Subject: Re: [R] problem of data manipulation
I just remembered that my actual dataset for var2 and var3
are numerical data,e.g. 12.34, not factors. The above example data is
misleading.
Suppose var2 and var3 are numerical variables, not factors. How should we
do
15 matches
Mail list logo