My brain's stuck trying to figure out a way to automate removing
duplicate (and triplicate) rows from a database table that's been exported
as a .sql file.

   Here's the situation: The file has 12.5K rows of duplicate and triplicate
data. These data have the same loc_name, sample_date, and param columns, but
the quant column may differ, and there are other columns from the table that
are not applicable.

   I thought that I had a python solution but it's not working. There must be
an awk solution for this, but I've not yet found it. Of course, all of you
are welcome to submit solutions regardless of tool or language because I
don't want to spend my life going through all these rows by hand. Here are
samples:

\N      CIL     2007-08-29      Manganese       0.00499999989   mg/L    \N
\N      \N      \N
\N      CIL     2007-08-29      Manganese       0.00499999989   mg/L    \N
\N      \N      \N
\N      CIL     2007-08-29      Mercury 0.846000016     mg/L    \N      \N
\N      \N
\N      CIL     2007-08-29      Mercury 0.846000016     mg/L    \N      \N
\N      \N
\N      CIL     2007-08-29      Molybdenum      1.88999999      mg/L    \N
\N      \N      \N
\N      CIL     2007-08-29      Molybdenum      1.88999999      mg/L    \N
\N      \N      \N
\N      CIL     2007-08-29      Nickel  0.140000001     mg/L    \N      \N
\N      \N
\N      CIL     2007-08-29      Nickel  0.140000001     mg/L    \N      \N
\N      \N

   Where values differ, we'll use the first one since there's no way to
determine which one is actually correct (if either). After all, there should
be only one value for a measured paramter at a specific location on a given
date.

TIA,

Rich




_______________________________________________
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to