On Tue, 16 Aug 2011, Hal Pomeranz wrote:

>       sort -u -k1,4 inputfile >inputfile.de-duped

   Wow! I have learned so much today about built-in tools that solve major
headaches in data cleaning.

   Running the results of uniq through sort (as above, but specifying -k2,4)
from 8605 rows to 5540. And this down from 12,500+ originally.

   My grateful thanks to all of you. Not only have I learned valuable uses of
common tools but you've saved me days of work.

Rich
_______________________________________________
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to