My brain's stuck trying to figure out a way to automate removing duplicate (and triplicate) rows from a database table that's been exported as a .sql file.
Here's the situation: The file has 12.5K rows of duplicate and triplicate data. These data have the same loc_name, sample_date, and param columns, but the quant column may differ, and there are other columns from the table that are not applicable. I thought that I had a python solution but it's not working. There must be an awk solution for this, but I've not yet found it. Of course, all of you are welcome to submit solutions regardless of tool or language because I don't want to spend my life going through all these rows by hand. Here are samples: \N CIL 2007-08-29 Manganese 0.00499999989 mg/L \N \N \N \N \N CIL 2007-08-29 Manganese 0.00499999989 mg/L \N \N \N \N \N CIL 2007-08-29 Mercury 0.846000016 mg/L \N \N \N \N \N CIL 2007-08-29 Mercury 0.846000016 mg/L \N \N \N \N \N CIL 2007-08-29 Molybdenum 1.88999999 mg/L \N \N \N \N \N CIL 2007-08-29 Molybdenum 1.88999999 mg/L \N \N \N \N \N CIL 2007-08-29 Nickel 0.140000001 mg/L \N \N \N \N \N CIL 2007-08-29 Nickel 0.140000001 mg/L \N \N \N \N Where values differ, we'll use the first one since there's no way to determine which one is actually correct (if either). After all, there should be only one value for a measured paramter at a specific location on a given date. TIA, Rich _______________________________________________ PLUG mailing list PLUG@lists.pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug