Hello everyone,
I want someone to share his expertise on the following:
Suppose we have a table with some data that periodically must be synced 
with an external source, which provides only partial information (e.g. 
it might provide all the fields for a new record, but might provide 
only a single field difference for an already existing record). This 
obviously will involve a series of UPDATE and INSERT statements, mostly 
in random order. As most RDBMS can hold only one prepared statement at a 
time (including MySQL) I have 2 ways to do this that look somewhat 
efficient:

1. Sweep the pre-existing table into a hash, DELETE all the records from 
it and INSERT the new data, using the hash as a source for missing 
fields.

2. Create 3 connections to the database, have the first one hold a 
SELECT statement that will replace the hash above, have the second 
connection be an INSERT for nonexisting records and the third connection 
be an UPDATE for existing ones

>From reading on DBI I understood that moving lots of data with do()
statements is pretty slow (e.g. constantly re-preparing the same 
statement), so I think scenario 2 with a single connection is even 
worse.

The reason I am elaborating on this is that I am afraid the dataset will 
stop fitting in memory at some point of time if I go with scenario 1 
(which is easier and cleaner to implement). Also I have no idea how 
resource intensive scenario 2 would be, although I have proper 
indexing and stuff so the UPDATES should be fast enough...

Thanks

Peter

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to