Merging 2 HUGE documents

S. Dale Morrey Thu, 02 Jan 2014 09:15:31 -0800

I have a 15 GB file and a 2 GB file.  Both are sourced from somewhat
similar data so there could be quite a lot of overlap, both are in the same
format, i.e. plaintext CSV 1 entry per line.


I'd like to read the 2GB file and add any entries that are present in it,
but missing in the 15GB file, basically merging the 2 files.  Space is at a
premium so I'd prefer it be an in place merge onto the 15GB file.  Even if
a temp file is made, It would still be best if the end result was a single
17GB file.

I don't want to reinvent the wheel.  The time to perform the operation is
irrelevant, but I'd greatly prefer there not be any dupes.  Is there a bash
command that could facilitate the activity?

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Merging 2 HUGE documents

Reply via email to