Re: [R] difftimes; histogram; memory problems

2010-02-15 Thread Gabor Grothendieck
Just one further point. If you do run out of memory using #2 then try
this which is the same as #2 but adds a dbname argument to force the
computation to be done from disk rather than memory.

sqldf("select d1.x - d2.x, count(*) from d1, d2 group by d1.x - d2.x",
dbname = tempfile())

On Mon, Feb 15, 2010 at 10:45 PM, Gabor Grothendieck
 wrote:
> Here are two approaches to try:
>
>> # test data
>> d1 <- data.frame(x = Sys.Date() + 1:3)
>> d2 <- data.frame(x = Sys.Date() - 1:3)
>
>> # 1. you might not have enough  memory for this but its short
>> table(outer(1:3, -(1:3), "-"))
>
> 2 3 4 5 6
> 1 2 3 2 1
>
>> # 2. this one performs all the operations outside of R getting
>> #    result back in so it won't need as much memory
>>
>> library(sqldf)
>> sqldf("select d1.x - d2.x, count(*) from d1, d2 group by d1.x - d2.x")
>  d1.x - d2.x count(*)
> 1           2        1
> 2           3        2
> 3           4        3
> 4           5        2
> 5           6        1
>
>
> On Mon, Feb 15, 2010 at 9:17 PM, Jonathan  wrote:
>> Let me fix a couple of typos in that email:
>>
>> Hi All:
>>
>> Let's say I have two dataframes (Condition1 and Condition2); each
>> being on the order of 12,000 and 16,000 rows; 1 column.  The entries
>> contain dates.
>>
>> I'd like to calculate, for each possible pair of dates (that is:
>> Condition1[1:12,000] and Condition2[1:16,000], the number of days
>> difference between the dates in the pair.  The result should be a
>> matrix 12,000 by 16,000, which I'll call M.  The purpose of building
>> such a matrix M is to create a histogram of all the values contained
>> within it.
>>
>> Ex):
>> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
>> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))
>>
>> First, my instinct is to try and vectorize the operation.  I tried
>> this by expanding each vector into a matrix of repeated vectors (I'd
>> then just subtract the two resultant matrices to get matrix M).  I got
>> the following error:
>>
>>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), 
>>> byrow=TRUE, ncol=nrow(Condition1))
>> Error: cannot allocate vector of size 732.4 Mb
>>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), 
>>> byrow=FALSE, nrow=nrow(Condition2))
>> Error: cannot allocate vector of size 732.4 Mb
>>
>> Since it seems these matrices are too large, I'm wondering whether
>> there's a better way to call a hist command without actually building
>> the said matrix..
>>
>> I'd greatly appreciate any ideas!
>>
>> Best,
>> Jonathan
>>
>> On Mon, Feb 15, 2010 at 8:19 PM, Jonathan  wrote:
>>> Hi All:
>>>
>>> Let's say I have two dataframes (Condition1 and Condition2); each
>>> being on the order of 12,000 and 16,000 rows; 1 column.  The entries
>>> contain dates.
>>>
>>> I'd like to calculate, for each possible pair of dates (that is:
>>> Condition1[1:10,000] and Condition2[1:10,000], the number of days
>>> difference between the dates in the pair.  The result should be a
>>> matrix 12,000 by 16,000.  Really, what I need is a histogram of all
>>> the values in this matrix.
>>>
>>> Ex):
>>> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
>>> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))
>>>
>>> First, my instinct is to try and vectorize the operation.  I tried
>>> this by expanding each vector into a matrix of repeated vectors (I'd
>>> then just subtract the two).  I got the following error:
>>>
 expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), 
 byrow=TRUE, ncol=nrow(Condition1))
>>> Error: cannot allocate vector of size 732.4 Mb
 expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), 
 byrow=FALSE, nrow=nrow(Condition2))
>>> Error: cannot allocate vector of size 732.4 Mb
>>>
>>> Since it seems these matrices are too large, I'm wondering whether
>>> there's a better way to call a hist command without actually building
>>> the said matrix..
>>>
>>> I'd greatly appreciate any ideas!
>>>
>>> Best,
>>> Jonathan
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difftimes; histogram; memory problems

2010-02-15 Thread Moshe Olshansky
Hi Jonathan,

If minDate = min(Condition1) - max(Condition2) and maxDate = max(Condition1) - 
min(Condition2) then all your differences would be between minDay and maxDay, 
and hopefully this is not a very big range (unless you are going many thousands 
years into the past or the future). So basically for any number of days in this 
range you should count the number of times it appears. To speed up the 
calculations you may do this with just one loop (and one vectorized operation) 
- I can not do this without a single loop (if we want to limit the memory use). 
Let me know if you need the actual code.

Regards,
Moshe.

--- On Tue, 16/2/10, Jonathan  wrote:

> From: Jonathan 
> Subject: Re: [R] difftimes; histogram; memory problems
> To: "r-help" 
> Received: Tuesday, 16 February, 2010, 1:17 PM
> Let me fix a couple of typos in that
> email:
> 
> Hi All:
> 
> Let's say I have two dataframes (Condition1 and
> Condition2); each
> being on the order of 12,000 and 16,000 rows; 1
> column.  The entries
> contain dates.
> 
> I'd like to calculate, for each possible pair of dates
> (that is:
> Condition1[1:12,000] and Condition2[1:16,000], the number
> of days
> difference between the dates in the pair.  The result
> should be a
> matrix 12,000 by 16,000, which I'll call M.  The
> purpose of building
> such a matrix M is to create a histogram of all the values
> contained
> within it.
> 
> Ex):
> Condition1 <- data.frame('dates' =
> rep(c('2001-02-10','1998-03-14'),6000))
> Condition2 <- data.frame('dates' =
> rep(c('2003-07-06','2007-03-11'),8000))
> 
> First, my instinct is to try and vectorize the
> operation.  I tried
> this by expanding each vector into a matrix of repeated
> vectors (I'd
> then just subtract the two resultant matrices to get matrix
> M).  I got
> the following error:
> 
> > expandedCondition1 <- matrix(rep(Condition1[[1]],
> nrow(Condition2)), byrow=TRUE, ncol=nrow(Condition1))
> Error: cannot allocate vector of size 732.4 Mb
> > expandedCondition2 <- matrix(rep(Condition2[[1]],
> nrow(Condition1)), byrow=FALSE, nrow=nrow(Condition2))
> Error: cannot allocate vector of size 732.4 Mb
> 
> Since it seems these matrices are too large, I'm wondering
> whether
> there's a better way to call a hist command without
> actually building
> the said matrix..
> 
> I'd greatly appreciate any ideas!
> 
> Best,
> Jonathan
> 
> On Mon, Feb 15, 2010 at 8:19 PM, Jonathan 
> wrote:
> > Hi All:
> >
> > Let's say I have two dataframes (Condition1 and
> Condition2); each
> > being on the order of 12,000 and 16,000 rows; 1
> column.  The entries
> > contain dates.
> >
> > I'd like to calculate, for each possible pair of dates
> (that is:
> > Condition1[1:10,000] and Condition2[1:10,000], the
> number of days
> > difference between the dates in the pair.  The result
> should be a
> > matrix 12,000 by 16,000.  Really, what I need is a
> histogram of all
> > the values in this matrix.
> >
> > Ex):
> > Condition1 <- data.frame('dates' =
> rep(c('2001-02-10','1998-03-14'),6000))
> > Condition2 <- data.frame('dates' =
> rep(c('2003-07-06','2007-03-11'),8000))
> >
> > First, my instinct is to try and vectorize the
> operation.  I tried
> > this by expanding each vector into a matrix of
> repeated vectors (I'd
> > then just subtract the two).  I got the following
> error:
> >
> >> expandedCondition1 <-
> matrix(rep(Condition1[[1]], nrow(Condition2)), byrow=TRUE,
> ncol=nrow(Condition1))
> > Error: cannot allocate vector of size 732.4 Mb
> >> expandedCondition2 <-
> matrix(rep(Condition2[[1]], nrow(Condition1)), byrow=FALSE,
> nrow=nrow(Condition2))
> > Error: cannot allocate vector of size 732.4 Mb
> >
> > Since it seems these matrices are too large, I'm
> wondering whether
> > there's a better way to call a hist command without
> actually building
> > the said matrix..
> >
> > I'd greatly appreciate any ideas!
> >
> > Best,
> > Jonathan
> >
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difftimes; histogram; memory problems

2010-02-15 Thread Gabor Grothendieck
Here are two approaches to try:

> # test data
> d1 <- data.frame(x = Sys.Date() + 1:3)
> d2 <- data.frame(x = Sys.Date() - 1:3)

> # 1. you might not have enough  memory for this but its short
> table(outer(1:3, -(1:3), "-"))

2 3 4 5 6
1 2 3 2 1

> # 2. this one performs all the operations outside of R getting
> #result back in so it won't need as much memory
>
> library(sqldf)
> sqldf("select d1.x - d2.x, count(*) from d1, d2 group by d1.x - d2.x")
  d1.x - d2.x count(*)
1   21
2   32
3   43
4   52
5   61


On Mon, Feb 15, 2010 at 9:17 PM, Jonathan  wrote:
> Let me fix a couple of typos in that email:
>
> Hi All:
>
> Let's say I have two dataframes (Condition1 and Condition2); each
> being on the order of 12,000 and 16,000 rows; 1 column.  The entries
> contain dates.
>
> I'd like to calculate, for each possible pair of dates (that is:
> Condition1[1:12,000] and Condition2[1:16,000], the number of days
> difference between the dates in the pair.  The result should be a
> matrix 12,000 by 16,000, which I'll call M.  The purpose of building
> such a matrix M is to create a histogram of all the values contained
> within it.
>
> Ex):
> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))
>
> First, my instinct is to try and vectorize the operation.  I tried
> this by expanding each vector into a matrix of repeated vectors (I'd
> then just subtract the two resultant matrices to get matrix M).  I got
> the following error:
>
>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), 
>> byrow=TRUE, ncol=nrow(Condition1))
> Error: cannot allocate vector of size 732.4 Mb
>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), 
>> byrow=FALSE, nrow=nrow(Condition2))
> Error: cannot allocate vector of size 732.4 Mb
>
> Since it seems these matrices are too large, I'm wondering whether
> there's a better way to call a hist command without actually building
> the said matrix..
>
> I'd greatly appreciate any ideas!
>
> Best,
> Jonathan
>
> On Mon, Feb 15, 2010 at 8:19 PM, Jonathan  wrote:
>> Hi All:
>>
>> Let's say I have two dataframes (Condition1 and Condition2); each
>> being on the order of 12,000 and 16,000 rows; 1 column.  The entries
>> contain dates.
>>
>> I'd like to calculate, for each possible pair of dates (that is:
>> Condition1[1:10,000] and Condition2[1:10,000], the number of days
>> difference between the dates in the pair.  The result should be a
>> matrix 12,000 by 16,000.  Really, what I need is a histogram of all
>> the values in this matrix.
>>
>> Ex):
>> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
>> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))
>>
>> First, my instinct is to try and vectorize the operation.  I tried
>> this by expanding each vector into a matrix of repeated vectors (I'd
>> then just subtract the two).  I got the following error:
>>
>>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), 
>>> byrow=TRUE, ncol=nrow(Condition1))
>> Error: cannot allocate vector of size 732.4 Mb
>>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), 
>>> byrow=FALSE, nrow=nrow(Condition2))
>> Error: cannot allocate vector of size 732.4 Mb
>>
>> Since it seems these matrices are too large, I'm wondering whether
>> there's a better way to call a hist command without actually building
>> the said matrix..
>>
>> I'd greatly appreciate any ideas!
>>
>> Best,
>> Jonathan
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difftimes; histogram; memory problems

2010-02-15 Thread Jonathan
Let me fix a couple of typos in that email:

Hi All:

Let's say I have two dataframes (Condition1 and Condition2); each
being on the order of 12,000 and 16,000 rows; 1 column.  The entries
contain dates.

I'd like to calculate, for each possible pair of dates (that is:
Condition1[1:12,000] and Condition2[1:16,000], the number of days
difference between the dates in the pair.  The result should be a
matrix 12,000 by 16,000, which I'll call M.  The purpose of building
such a matrix M is to create a histogram of all the values contained
within it.

Ex):
Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))

First, my instinct is to try and vectorize the operation.  I tried
this by expanding each vector into a matrix of repeated vectors (I'd
then just subtract the two resultant matrices to get matrix M).  I got
the following error:

> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), 
> byrow=TRUE, ncol=nrow(Condition1))
Error: cannot allocate vector of size 732.4 Mb
> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), 
> byrow=FALSE, nrow=nrow(Condition2))
Error: cannot allocate vector of size 732.4 Mb

Since it seems these matrices are too large, I'm wondering whether
there's a better way to call a hist command without actually building
the said matrix..

I'd greatly appreciate any ideas!

Best,
Jonathan

On Mon, Feb 15, 2010 at 8:19 PM, Jonathan  wrote:
> Hi All:
>
> Let's say I have two dataframes (Condition1 and Condition2); each
> being on the order of 12,000 and 16,000 rows; 1 column.  The entries
> contain dates.
>
> I'd like to calculate, for each possible pair of dates (that is:
> Condition1[1:10,000] and Condition2[1:10,000], the number of days
> difference between the dates in the pair.  The result should be a
> matrix 12,000 by 16,000.  Really, what I need is a histogram of all
> the values in this matrix.
>
> Ex):
> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))
>
> First, my instinct is to try and vectorize the operation.  I tried
> this by expanding each vector into a matrix of repeated vectors (I'd
> then just subtract the two).  I got the following error:
>
>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), 
>> byrow=TRUE, ncol=nrow(Condition1))
> Error: cannot allocate vector of size 732.4 Mb
>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), 
>> byrow=FALSE, nrow=nrow(Condition2))
> Error: cannot allocate vector of size 732.4 Mb
>
> Since it seems these matrices are too large, I'm wondering whether
> there's a better way to call a hist command without actually building
> the said matrix..
>
> I'd greatly appreciate any ideas!
>
> Best,
> Jonathan
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] difftimes; histogram; memory problems

2010-02-15 Thread Jonathan
Hi All:

Let's say I have two dataframes (Condition1 and Condition2); each
being on the order of 12,000 and 16,000 rows; 1 column.  The entries
contain dates.

I'd like to calculate, for each possible pair of dates (that is:
Condition1[1:10,000] and Condition2[1:10,000], the number of days
difference between the dates in the pair.  The result should be a
matrix 12,000 by 16,000.  Really, what I need is a histogram of all
the values in this matrix.

Ex):
Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))

First, my instinct is to try and vectorize the operation.  I tried
this by expanding each vector into a matrix of repeated vectors (I'd
then just subtract the two).  I got the following error:

> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), 
> byrow=TRUE, ncol=nrow(Condition1))
Error: cannot allocate vector of size 732.4 Mb
> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), 
> byrow=FALSE, nrow=nrow(Condition2))
Error: cannot allocate vector of size 732.4 Mb

Since it seems these matrices are too large, I'm wondering whether
there's a better way to call a hist command without actually building
the said matrix..

I'd greatly appreciate any ideas!

Best,
Jonathan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.