Re: console input

Shawn Green Fri, 15 Jun 2012 05:49:02 -0700

On 6/14/2012 5:57 PM, Gary Aitken wrote:

Hi all,


I've looked high and low for what I hope is a trivial answer.

I was trying to load a table using LOAD DATA INFILE.  Unfortunately, it craps 
out because there are some duplicate primary keys.  Not surprising as the 
source did not enforce uniqueness.  My problem is the load data simply dies 
without indicating which line of the input file was in error; the error message 
refers to line 3, which is not even the SQL statement for the LOAD DATA INTO 
statement:

I can get the table loaded by specifying REPLACE INTO TABLE, but that still 
leaves me with not knowing where the duplicate records are.

So...  I wanted to read the data line at a time and use a plain INSERT 
statement.  That way I could check for duplicate keys and discover where the 
duplicate records are.  However, I can't find a way to read input from the 
console or a file.  What am I missing?  I know I could write a java or C++ 
program to do this, but it seems like overkill for what should be a trivial 
task.

Thanks for any pointers,

Gary

The trivial thing I do to solve this problem is to create a copy of thedestination table without any PRIMARY KEY or UNIQUE constraints on it.This gives you an empty space to which you can bulk import your rawdata. I am not sure if there is an official term for this but I call ita 'landing table'. This is the first step of the import process.

Once you can get your data off of disk and onto the landing table (it'swhere the raw import lands inside the database) you can check it forduplicates very easily.


1) create a normal index for the PRIMARY KEY column
2) create another table that has a list of duplicateslike this

CREATE TABLE dup_list ENGINE=MYISAM SELECT pkcol, count(*) freq FROMlanding GROUP BY pkcol HAVING freq >1;


notes:

* use a MyISAM table for this preparation work even if the destinationtable is using the InnoDB storage engine, you really do not need toprotect this data with a transaction yet.

* MyISAM indexes are also very fast for count(*) queries.

You have clearly identified all duplicate rows in the incoming data. Youcan also compare those rows with your existing rows to see if any ofthem duplicate each other (hint: INNER JOIN is your friend) or if anyexist in one table but not the other (hint: LEFT JOIN).

From here you should be able to cleanse the incoming data (removeduplicates, adjust any weird fields) and merge it with your existingdata to maintain the relational and logical integrity of your tables.


Regards,
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Re: console input

Reply via email to