Re: [R] Intersecting two matrices

2013-07-31 Thread Jeff Newmiller
I would appreciate it if you would follow the Posting Guide and give a 
reproducible example and post all messages using plain text.

Try

m1 <- matrix(sample(0:999,2*1057837,TRUE),ncol=2)
m2 <- matrix(sample(0:999,2*951980,TRUE),ncol=2)
df1 <- as.data.frame(m1)
df2 <- as.data.frame(m2)
library(sqldf)
system.time(df3 <- sqldf("SELECT DISTINCT df1.V1, df1.V2 FROM df1 INNER JOIN 
df2 ON df1.V1=df2.V1 AND df1.V2=df2.V2") )

The speed seems heavily dependent on how many rows are duplicated within the 
input data frames... so if the range of values is small then the query runs 
slower. Note also that moving the data from R to the database and back takes 
time... you may be able to import the data directly from your source data to 
the database and save some time. Read ?sqldf and ?read.csv.sql examples for 
more info.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

c char  wrote:
>I am not familiar with R's sort and sql libs. appreciate if you can
>post a
>code snippet when you got time. Thanks a lot!
>
>
>On Tue, Jul 30, 2013 at 10:36 AM, Jeff Newmiller
>wrote:
>
>> In that case, you should be looking at a relational inner join,
>perhaps
>> with SQLite (see package sqldf).
>>
>---
>> Jeff NewmillerThe .   .  Go
>Live...
>> DCN:Basics: ##.#.   ##.#.  Live
>> Go...
>>   Live:   OO#.. Dead: OO#.. 
>Playing
>> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>> /Software/Embedded Controllers)   .OO#.   .OO#. 
>rocks...1k
>>
>---
>> Sent from my phone. Please excuse my brevity.
>>
>> c char  wrote:
>> >Thanks a lot.
>> >Still looking for some super fast and memory efficient solution, as
>the
>> >matrix I have in real world has billions of rows.
>> >
>> >
>> >On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap 
>> >wrote:
>> >
>> >> I haven't looked at the size-time relationship, but im2 (below) is
>> >faster
>> >> than your
>> >> function on at least one example:
>> >>
>> >> intersectMat <- function(mat1, mat2)
>> >> {
>> >> #mat1 and mat2 are both deduplicated
>> >> nr1 <- nrow(mat1)
>> >> nr2 <- nrow(mat2)
>> >> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ,
>> >> drop=FALSE]
>> >> }
>> >>
>> >> im2 <- function(mat1, mat2)
>> >> {
>> >> stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
>> >> toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1],
>> >> twoColMat[,2])
>> >> mat1[match(toChar(mat2), toChar(mat1), nomatch=0), ,
>drop=FALSE]
>> >> }
>> >>
>> >> > m1 <- cbind(1:1e7, rep(1:10, len=1e7))
>> >> > m2 <- cbind(1:1e7, rep(1:20, len=1e7))
>> >> > system.time(r1 <- intersectMat(m1,m2))
>> >>user  system elapsed
>> >>  430.371.96  433.98
>> >> > system.time(r2 <- im2(m1,m2))
>> >>user  system elapsed
>> >>   27.890.20   28.13
>> >> > identical(r1, r2)
>> >> [1] TRUE
>> >> > dim(r1)
>> >> [1] 500   2
>> >>
>> >> Bill Dunlap
>> >> Spotfire, TIBCO Software
>> >> wdunlap tibco.com
>> >>
>> >>
>> >> > -Original Message-
>> >> > From: r-help-boun...@r-project.org
>> >[mailto:r-help-boun...@r-project.org]
>> >> On Behalf
>> >> > Of c char
>> >> > Sent: Monday, July 29, 2013 4:04 PM
>> >> > To: r-help@r-project.org
>> >> > Subject: [R] Intersecting two matrices
>> >> >
>> >> > Dear all,
>> >> >
>> >> > I am interested to know a faster matrix intersection package for
&

Re: [R] Intersecting two matrices

2013-07-30 Thread c char
I am not familiar with R's sort and sql libs. appreciate if you can post a
code snippet when you got time. Thanks a lot!


On Tue, Jul 30, 2013 at 10:36 AM, Jeff Newmiller
wrote:

> In that case, you should be looking at a relational inner join, perhaps
> with SQLite (see package sqldf).
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live
> Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---
> Sent from my phone. Please excuse my brevity.
>
> c char  wrote:
> >Thanks a lot.
> >Still looking for some super fast and memory efficient solution, as the
> >matrix I have in real world has billions of rows.
> >
> >
> >On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap 
> >wrote:
> >
> >> I haven't looked at the size-time relationship, but im2 (below) is
> >faster
> >> than your
> >> function on at least one example:
> >>
> >> intersectMat <- function(mat1, mat2)
> >> {
> >> #mat1 and mat2 are both deduplicated
> >> nr1 <- nrow(mat1)
> >> nr2 <- nrow(mat2)
> >> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ,
> >> drop=FALSE]
> >> }
> >>
> >> im2 <- function(mat1, mat2)
> >> {
> >> stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
> >> toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1],
> >> twoColMat[,2])
> >> mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE]
> >> }
> >>
> >> > m1 <- cbind(1:1e7, rep(1:10, len=1e7))
> >> > m2 <- cbind(1:1e7, rep(1:20, len=1e7))
> >> > system.time(r1 <- intersectMat(m1,m2))
> >>user  system elapsed
> >>  430.371.96  433.98
> >> > system.time(r2 <- im2(m1,m2))
> >>user  system elapsed
> >>   27.890.20   28.13
> >> > identical(r1, r2)
> >> [1] TRUE
> >> > dim(r1)
> >> [1] 500   2
> >>
> >> Bill Dunlap
> >> Spotfire, TIBCO Software
> >> wdunlap tibco.com
> >>
> >>
> >> > -Original Message-
> >> > From: r-help-boun...@r-project.org
> >[mailto:r-help-boun...@r-project.org]
> >> On Behalf
> >> > Of c char
> >> > Sent: Monday, July 29, 2013 4:04 PM
> >> > To: r-help@r-project.org
> >> > Subject: [R] Intersecting two matrices
> >> >
> >> > Dear all,
> >> >
> >> > I am interested to know a faster matrix intersection package for R
> >> handles
> >> > intersection of two integer matrices with ncol=2. Currently I am
> >using my
> >> > homemade code adapted from a previous thread:
> >> >
> >> >
> >> > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
> >> > deduplicated  nr1 <- nrow(mat1)  nr2 <- nrow(mat2)
> >> > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
> >> >
> >> >
> >> > which handles:
> >> > size A= 10578373
> >> > size B= 9519807
> >> > expected intersecting time= 251.2272
> >> > intersecting for corssing MPRs took 409.602 seconds.
> >> >
> >> > scale a little bit worse than linearly but atomic operation is not
> >good.
> >> > Wonder if a super fast C/C++ extension exists for this task. Your
> >ideas
> >> are
> >> > appreciated.
> >> >
> >> > Thanks!
> >> >
> >> >   [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-help@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Intersecting two matrices

2013-07-30 Thread Jeff Newmiller
In that case, you should be looking at a relational inner join, perhaps with 
SQLite (see package sqldf).
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

c char  wrote:
>Thanks a lot.
>Still looking for some super fast and memory efficient solution, as the
>matrix I have in real world has billions of rows.
>
>
>On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap 
>wrote:
>
>> I haven't looked at the size-time relationship, but im2 (below) is
>faster
>> than your
>> function on at least one example:
>>
>> intersectMat <- function(mat1, mat2)
>> {
>> #mat1 and mat2 are both deduplicated
>> nr1 <- nrow(mat1)
>> nr2 <- nrow(mat2)
>> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ,
>> drop=FALSE]
>> }
>>
>> im2 <- function(mat1, mat2)
>> {
>> stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
>> toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1],
>> twoColMat[,2])
>> mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE]
>> }
>>
>> > m1 <- cbind(1:1e7, rep(1:10, len=1e7))
>> > m2 <- cbind(1:1e7, rep(1:20, len=1e7))
>> > system.time(r1 <- intersectMat(m1,m2))
>>user  system elapsed
>>  430.371.96  433.98
>> > system.time(r2 <- im2(m1,m2))
>>user  system elapsed
>>   27.890.20   28.13
>> > identical(r1, r2)
>> [1] TRUE
>> > dim(r1)
>> [1] 500   2
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>> > -Original Message-
>> > From: r-help-boun...@r-project.org
>[mailto:r-help-boun...@r-project.org]
>> On Behalf
>> > Of c char
>> > Sent: Monday, July 29, 2013 4:04 PM
>> > To: r-help@r-project.org
>> > Subject: [R] Intersecting two matrices
>> >
>> > Dear all,
>> >
>> > I am interested to know a faster matrix intersection package for R
>> handles
>> > intersection of two integer matrices with ncol=2. Currently I am
>using my
>> > homemade code adapted from a previous thread:
>> >
>> >
>> > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
>> > deduplicated  nr1 <- nrow(mat1)  nr2 <- nrow(mat2)
>> > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
>> >
>> >
>> > which handles:
>> > size A= 10578373
>> > size B= 9519807
>> > expected intersecting time= 251.2272
>> > intersecting for corssing MPRs took 409.602 seconds.
>> >
>> > scale a little bit worse than linearly but atomic operation is not
>good.
>> > Wonder if a super fast C/C++ extension exists for this task. Your
>ideas
>> are
>> > appreciated.
>> >
>> > Thanks!
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Intersecting two matrices

2013-07-30 Thread c char
Thanks a lot.
Still looking for some super fast and memory efficient solution, as the
matrix I have in real world has billions of rows.


On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap  wrote:

> I haven't looked at the size-time relationship, but im2 (below) is faster
> than your
> function on at least one example:
>
> intersectMat <- function(mat1, mat2)
> {
> #mat1 and mat2 are both deduplicated
> nr1 <- nrow(mat1)
> nr2 <- nrow(mat2)
> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ,
> drop=FALSE]
> }
>
> im2 <- function(mat1, mat2)
> {
> stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
> toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1],
> twoColMat[,2])
> mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE]
> }
>
> > m1 <- cbind(1:1e7, rep(1:10, len=1e7))
> > m2 <- cbind(1:1e7, rep(1:20, len=1e7))
> > system.time(r1 <- intersectMat(m1,m2))
>user  system elapsed
>  430.371.96  433.98
> > system.time(r2 <- im2(m1,m2))
>user  system elapsed
>   27.890.20   28.13
> > identical(r1, r2)
> [1] TRUE
> > dim(r1)
> [1] 500   2
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of c char
> > Sent: Monday, July 29, 2013 4:04 PM
> > To: r-help@r-project.org
> > Subject: [R] Intersecting two matrices
> >
> > Dear all,
> >
> > I am interested to know a faster matrix intersection package for R
> handles
> > intersection of two integer matrices with ncol=2. Currently I am using my
> > homemade code adapted from a previous thread:
> >
> >
> > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
> > deduplicated  nr1 <- nrow(mat1)  nr2 <- nrow(mat2)
> > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
> >
> >
> > which handles:
> > size A= 10578373
> > size B= 9519807
> > expected intersecting time= 251.2272
> > intersecting for corssing MPRs took 409.602 seconds.
> >
> > scale a little bit worse than linearly but atomic operation is not good.
> > Wonder if a super fast C/C++ extension exists for this task. Your ideas
> are
> > appreciated.
> >
> > Thanks!
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Intersecting two matrices

2013-07-29 Thread William Dunlap
I haven't looked at the size-time relationship, but im2 (below) is faster than 
your
function on at least one example:

intersectMat <- function(mat1, mat2)
{
#mat1 and mat2 are both deduplicated
nr1 <- nrow(mat1)
nr2 <- nrow(mat2)
mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , drop=FALSE]
}

im2 <- function(mat1, mat2)
{
stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1], twoColMat[,2])
mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE]
}

> m1 <- cbind(1:1e7, rep(1:10, len=1e7))
> m2 <- cbind(1:1e7, rep(1:20, len=1e7))
> system.time(r1 <- intersectMat(m1,m2))
   user  system elapsed 
 430.371.96  433.98 
> system.time(r2 <- im2(m1,m2))
   user  system elapsed 
  27.890.20   28.13 
> identical(r1, r2)
[1] TRUE
> dim(r1)
[1] 500   2

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of c char
> Sent: Monday, July 29, 2013 4:04 PM
> To: r-help@r-project.org
> Subject: [R] Intersecting two matrices
> 
> Dear all,
> 
> I am interested to know a faster matrix intersection package for R handles
> intersection of two integer matrices with ncol=2. Currently I am using my
> homemade code adapted from a previous thread:
> 
> 
> intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
> deduplicated  nr1 <- nrow(mat1)  nr2 <- nrow(mat2)
> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
> 
> 
> which handles:
> size A= 10578373
> size B= 9519807
> expected intersecting time= 251.2272
> intersecting for corssing MPRs took 409.602 seconds.
> 
> scale a little bit worse than linearly but atomic operation is not good.
> Wonder if a super fast C/C++ extension exists for this task. Your ideas are
> appreciated.
> 
> Thanks!
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Intersecting two matrices

2013-07-29 Thread c char
Dear all,

I am interested to know a faster matrix intersection package for R handles
intersection of two integer matrices with ncol=2. Currently I am using my
homemade code adapted from a previous thread:


intersectMat <- function(mat1, mat2){#mat1 and mat2 are both
deduplicated  nr1 <- nrow(mat1)  nr2 <- nrow(mat2)
mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}


which handles:
size A= 10578373
size B= 9519807
expected intersecting time= 251.2272
intersecting for corssing MPRs took 409.602 seconds.

scale a little bit worse than linearly but atomic operation is not good.
Wonder if a super fast C/C++ extension exists for this task. Your ideas are
appreciated.

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.