[R] merging pre-sorted data frames

Mike Miller Tue, 13 Jan 2015 16:57:07 -0800

I have many pairs of data frames each with about 15 million records eachand about 10 million records in common. They are sorted by two of theirfields and will be merged by those same fields.

The fact that the data are sorted could be used to greatly speed up amerge, but I have the impression that merge() cannot "know" in advancethat the fields are already sorted.

I'm sure that I can use merge(), but I suspect that it is doing a lot ofunnecessary work and that it will take much more time than the job reallyshould require. Is that correct? Can anything be done about it?


The inspiration for my question comes partly from the way GNU comm works.

If you have any ideas about this, I'd love to hear them.

Thanks in advance.

Mike

--
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4AAAAJ

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] merging pre-sorted data frames

Reply via email to