I'm considering the best strategy for updating large tables that I'm indexing with Lucene. I'm indexing in chunks of 5000 items at a time, which I select with LIMIT statements and an incremental multiplier (SELECT blah FROM foo LIMIT 0, 5000; SELECT blah FROM foo LIMIT 5000, 5000; etc.).

After a group is added to the Lucene index, I want to update a column that is a flag to show that the record is done (TINYINT(1) UNSIGNED NOT NULL). This is to support incremental indexing (and so that if an indexing process fails or is interrupted, it will resume with minimal redundancy).

The best approach I've imagined is to create a HEAP table and for each cycle, fill it with the next 5000 primary keys from the source table, SELECT using a join against the HEAP table, the UPDATE on the same join when the Lucene indexing is successful.

Anybody suggest anything else? I looked at using LOCK IN SHARE MODE, but I don't see how to be sure that a SELECT ... LIMIT... statement will select the same messages as UPDATE... LIMIT... unless I lock the whole table, which seems needless.

Nick


-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to