I am importing records from a text file into my DB.  Each record in the text
file corresponds to six tables in the DB due to normalization.

So for example, an inventory record containing a part, vendor,
contactPerson, mpn etc...in ONE line of a text file, requires 6 inserts into
the DB.

Further more, before each insert, I need to check for redundancy.  So, if an
inventory line in the text file has vendor 'Scott', BEFORE I add 'Scott' to
the vendor table, I check the vendor table for whether 'Scott' exists.  If
'Scott' does exist, then I just pull the ID and use that in the inventory
insert - if 'Scott' DOESN'T exist yet, I insert 'Scott' into the Vendor
Table, get the last_insert_id() and use that in the inventory table.

Each LINE in the text file can result in more than 20 Select/Insert
statements of the underlying DB before I can insert the record with all the
properly allocated foreign keys.

Considering many of these text files have thousands of lines of inventory,
as you can imagine, I have a massive performance problem.  Each complete
line of the text file requires about 1 full second to validate and insert
into the underlying schema.

I'm using InnoDB tables so alot of the Insert Optimization techniques I
found from MySQL don't seem to apply too well.  I'm hoping for some
experienced feedback in alternative techniques for performing this sort of
import.

Some further info:

In PHP 5, I get my connection object, and then in a loop I'm calling
$connecion->query(); for each line of the text file.  I'm assuming this is
always using the same connection and that each query doesn't require a
re-connect to the DB...if it does, that's a major bottleneck that could be
avoided (with persistent connections?)

ie
$conn = mysqli_init(); //null connection object
$conn->real_connect('host', 'user', 'pass', 'db') or die('connection');
foreach($file as $line)
{
  buildQuery();
  $result = $conn->query($sql);
}

I hope each time I call $conn->query($sql) it's using the SAME connection
resource and not having to reconnect to the DB.

Secondly, $sql involves a call to a Stored Procedure, which in turn ends up
calling other stored procedures to faciliate all the transactions,
validations and inserts...I assume that's not too much more innefficient
than using sql insert statements directly.

Any help is appreciated.
Scott.

Reply via email to