Re: [R] Intersecting two matrices
I would appreciate it if you would follow the Posting Guide and give a reproducible example and post all messages using plain text. Try m1 <- matrix(sample(0:999,2*1057837,TRUE),ncol=2) m2 <- matrix(sample(0:999,2*951980,TRUE),ncol=2) df1 <- as.data.frame(m1) df2 <- as.data.frame(m2) library(sqldf) system.time(df3 <- sqldf("SELECT DISTINCT df1.V1, df1.V2 FROM df1 INNER JOIN df2 ON df1.V1=df2.V1 AND df1.V2=df2.V2") ) The speed seems heavily dependent on how many rows are duplicated within the input data frames... so if the range of values is small then the query runs slower. Note also that moving the data from R to the database and back takes time... you may be able to import the data directly from your source data to the database and save some time. Read ?sqldf and ?read.csv.sql examples for more info. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. c char wrote: >I am not familiar with R's sort and sql libs. appreciate if you can >post a >code snippet when you got time. Thanks a lot! > > >On Tue, Jul 30, 2013 at 10:36 AM, Jeff Newmiller >wrote: > >> In that case, you should be looking at a relational inner join, >perhaps >> with SQLite (see package sqldf). >> >--- >> Jeff NewmillerThe . . Go >Live... >> DCN:Basics: ##.#. ##.#. Live >> Go... >> Live: OO#.. Dead: OO#.. >Playing >> Research Engineer (Solar/BatteriesO.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >rocks...1k >> >--- >> Sent from my phone. Please excuse my brevity. >> >> c char wrote: >> >Thanks a lot. >> >Still looking for some super fast and memory efficient solution, as >the >> >matrix I have in real world has billions of rows. >> > >> > >> >On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap >> >wrote: >> > >> >> I haven't looked at the size-time relationship, but im2 (below) is >> >faster >> >> than your >> >> function on at least one example: >> >> >> >> intersectMat <- function(mat1, mat2) >> >> { >> >> #mat1 and mat2 are both deduplicated >> >> nr1 <- nrow(mat1) >> >> nr2 <- nrow(mat2) >> >> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , >> >> drop=FALSE] >> >> } >> >> >> >> im2 <- function(mat1, mat2) >> >> { >> >> stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2)) >> >> toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1], >> >> twoColMat[,2]) >> >> mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , >drop=FALSE] >> >> } >> >> >> >> > m1 <- cbind(1:1e7, rep(1:10, len=1e7)) >> >> > m2 <- cbind(1:1e7, rep(1:20, len=1e7)) >> >> > system.time(r1 <- intersectMat(m1,m2)) >> >>user system elapsed >> >> 430.371.96 433.98 >> >> > system.time(r2 <- im2(m1,m2)) >> >>user system elapsed >> >> 27.890.20 28.13 >> >> > identical(r1, r2) >> >> [1] TRUE >> >> > dim(r1) >> >> [1] 500 2 >> >> >> >> Bill Dunlap >> >> Spotfire, TIBCO Software >> >> wdunlap tibco.com >> >> >> >> >> >> > -Original Message- >> >> > From: r-help-boun...@r-project.org >> >[mailto:r-help-boun...@r-project.org] >> >> On Behalf >> >> > Of c char >> >> > Sent: Monday, July 29, 2013 4:04 PM >> >> > To: r-help@r-project.org >> >> > Subject: [R] Intersecting two matrices >> >> > >> >> > Dear all, >> >> > >> >> > I am interested to know a faster matrix intersection package for &
Re: [R] Intersecting two matrices
I am not familiar with R's sort and sql libs. appreciate if you can post a code snippet when you got time. Thanks a lot! On Tue, Jul 30, 2013 at 10:36 AM, Jeff Newmiller wrote: > In that case, you should be looking at a relational inner join, perhaps > with SQLite (see package sqldf). > --- > Jeff NewmillerThe . . Go Live... > DCN:Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- > Sent from my phone. Please excuse my brevity. > > c char wrote: > >Thanks a lot. > >Still looking for some super fast and memory efficient solution, as the > >matrix I have in real world has billions of rows. > > > > > >On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap > >wrote: > > > >> I haven't looked at the size-time relationship, but im2 (below) is > >faster > >> than your > >> function on at least one example: > >> > >> intersectMat <- function(mat1, mat2) > >> { > >> #mat1 and mat2 are both deduplicated > >> nr1 <- nrow(mat1) > >> nr2 <- nrow(mat2) > >> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , > >> drop=FALSE] > >> } > >> > >> im2 <- function(mat1, mat2) > >> { > >> stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2)) > >> toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1], > >> twoColMat[,2]) > >> mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE] > >> } > >> > >> > m1 <- cbind(1:1e7, rep(1:10, len=1e7)) > >> > m2 <- cbind(1:1e7, rep(1:20, len=1e7)) > >> > system.time(r1 <- intersectMat(m1,m2)) > >>user system elapsed > >> 430.371.96 433.98 > >> > system.time(r2 <- im2(m1,m2)) > >>user system elapsed > >> 27.890.20 28.13 > >> > identical(r1, r2) > >> [1] TRUE > >> > dim(r1) > >> [1] 500 2 > >> > >> Bill Dunlap > >> Spotfire, TIBCO Software > >> wdunlap tibco.com > >> > >> > >> > -Original Message- > >> > From: r-help-boun...@r-project.org > >[mailto:r-help-boun...@r-project.org] > >> On Behalf > >> > Of c char > >> > Sent: Monday, July 29, 2013 4:04 PM > >> > To: r-help@r-project.org > >> > Subject: [R] Intersecting two matrices > >> > > >> > Dear all, > >> > > >> > I am interested to know a faster matrix intersection package for R > >> handles > >> > intersection of two integer matrices with ncol=2. Currently I am > >using my > >> > homemade code adapted from a previous thread: > >> > > >> > > >> > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both > >> > deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2) > >> > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]} > >> > > >> > > >> > which handles: > >> > size A= 10578373 > >> > size B= 9519807 > >> > expected intersecting time= 251.2272 > >> > intersecting for corssing MPRs took 409.602 seconds. > >> > > >> > scale a little bit worse than linearly but atomic operation is not > >good. > >> > Wonder if a super fast C/C++ extension exists for this task. Your > >ideas > >> are > >> > appreciated. > >> > > >> > Thanks! > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > __ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > > > > [[alternative HTML version deleted]] > > > >__ > >R-help@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersecting two matrices
In that case, you should be looking at a relational inner join, perhaps with SQLite (see package sqldf). --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. c char wrote: >Thanks a lot. >Still looking for some super fast and memory efficient solution, as the >matrix I have in real world has billions of rows. > > >On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap >wrote: > >> I haven't looked at the size-time relationship, but im2 (below) is >faster >> than your >> function on at least one example: >> >> intersectMat <- function(mat1, mat2) >> { >> #mat1 and mat2 are both deduplicated >> nr1 <- nrow(mat1) >> nr2 <- nrow(mat2) >> mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , >> drop=FALSE] >> } >> >> im2 <- function(mat1, mat2) >> { >> stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2)) >> toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1], >> twoColMat[,2]) >> mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE] >> } >> >> > m1 <- cbind(1:1e7, rep(1:10, len=1e7)) >> > m2 <- cbind(1:1e7, rep(1:20, len=1e7)) >> > system.time(r1 <- intersectMat(m1,m2)) >>user system elapsed >> 430.371.96 433.98 >> > system.time(r2 <- im2(m1,m2)) >>user system elapsed >> 27.890.20 28.13 >> > identical(r1, r2) >> [1] TRUE >> > dim(r1) >> [1] 500 2 >> >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com >> >> >> > -Original Message- >> > From: r-help-boun...@r-project.org >[mailto:r-help-boun...@r-project.org] >> On Behalf >> > Of c char >> > Sent: Monday, July 29, 2013 4:04 PM >> > To: r-help@r-project.org >> > Subject: [R] Intersecting two matrices >> > >> > Dear all, >> > >> > I am interested to know a faster matrix intersection package for R >> handles >> > intersection of two integer matrices with ncol=2. Currently I am >using my >> > homemade code adapted from a previous thread: >> > >> > >> > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both >> > deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2) >> > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]} >> > >> > >> > which handles: >> > size A= 10578373 >> > size B= 9519807 >> > expected intersecting time= 251.2272 >> > intersecting for corssing MPRs took 409.602 seconds. >> > >> > scale a little bit worse than linearly but atomic operation is not >good. >> > Wonder if a super fast C/C++ extension exists for this task. Your >ideas >> are >> > appreciated. >> > >> > Thanks! >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersecting two matrices
Thanks a lot. Still looking for some super fast and memory efficient solution, as the matrix I have in real world has billions of rows. On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap wrote: > I haven't looked at the size-time relationship, but im2 (below) is faster > than your > function on at least one example: > > intersectMat <- function(mat1, mat2) > { > #mat1 and mat2 are both deduplicated > nr1 <- nrow(mat1) > nr2 <- nrow(mat2) > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , > drop=FALSE] > } > > im2 <- function(mat1, mat2) > { > stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2)) > toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1], > twoColMat[,2]) > mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE] > } > > > m1 <- cbind(1:1e7, rep(1:10, len=1e7)) > > m2 <- cbind(1:1e7, rep(1:20, len=1e7)) > > system.time(r1 <- intersectMat(m1,m2)) >user system elapsed > 430.371.96 433.98 > > system.time(r2 <- im2(m1,m2)) >user system elapsed > 27.890.20 28.13 > > identical(r1, r2) > [1] TRUE > > dim(r1) > [1] 500 2 > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf > > Of c char > > Sent: Monday, July 29, 2013 4:04 PM > > To: r-help@r-project.org > > Subject: [R] Intersecting two matrices > > > > Dear all, > > > > I am interested to know a faster matrix intersection package for R > handles > > intersection of two integer matrices with ncol=2. Currently I am using my > > homemade code adapted from a previous thread: > > > > > > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both > > deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2) > > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]} > > > > > > which handles: > > size A= 10578373 > > size B= 9519807 > > expected intersecting time= 251.2272 > > intersecting for corssing MPRs took 409.602 seconds. > > > > scale a little bit worse than linearly but atomic operation is not good. > > Wonder if a super fast C/C++ extension exists for this task. Your ideas > are > > appreciated. > > > > Thanks! > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersecting two matrices
I haven't looked at the size-time relationship, but im2 (below) is faster than your function on at least one example: intersectMat <- function(mat1, mat2) { #mat1 and mat2 are both deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2) mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], , drop=FALSE] } im2 <- function(mat1, mat2) { stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2)) toChar <- function(twoColMat) paste(sep="\1", twoColMat[,1], twoColMat[,2]) mat1[match(toChar(mat2), toChar(mat1), nomatch=0), , drop=FALSE] } > m1 <- cbind(1:1e7, rep(1:10, len=1e7)) > m2 <- cbind(1:1e7, rep(1:20, len=1e7)) > system.time(r1 <- intersectMat(m1,m2)) user system elapsed 430.371.96 433.98 > system.time(r2 <- im2(m1,m2)) user system elapsed 27.890.20 28.13 > identical(r1, r2) [1] TRUE > dim(r1) [1] 500 2 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of c char > Sent: Monday, July 29, 2013 4:04 PM > To: r-help@r-project.org > Subject: [R] Intersecting two matrices > > Dear all, > > I am interested to know a faster matrix intersection package for R handles > intersection of two integer matrices with ncol=2. Currently I am using my > homemade code adapted from a previous thread: > > > intersectMat <- function(mat1, mat2){#mat1 and mat2 are both > deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2) > mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]} > > > which handles: > size A= 10578373 > size B= 9519807 > expected intersecting time= 251.2272 > intersecting for corssing MPRs took 409.602 seconds. > > scale a little bit worse than linearly but atomic operation is not good. > Wonder if a super fast C/C++ extension exists for this task. Your ideas are > appreciated. > > Thanks! > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Intersecting two matrices
Dear all, I am interested to know a faster matrix intersection package for R handles intersection of two integer matrices with ncol=2. Currently I am using my homemade code adapted from a previous thread: intersectMat <- function(mat1, mat2){#mat1 and mat2 are both deduplicated nr1 <- nrow(mat1) nr2 <- nrow(mat2) mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]} which handles: size A= 10578373 size B= 9519807 expected intersecting time= 251.2272 intersecting for corssing MPRs took 409.602 seconds. scale a little bit worse than linearly but atomic operation is not good. Wonder if a super fast C/C++ extension exists for this task. Your ideas are appreciated. Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.