Hi, I'm working on a rewrite of a batch process that operates on a large InnoDB database. In the past, it would process the entire database every night, but now the size of the data is making that impossible, and there is a desire for the process to operate in near real-time, so I'm rewriting it to work incrementally. The idea is to run it from cron every 5 minutes and process the data from the last 5 minutes.
The problem I'm having is identifying which rows have changed. I'm not concerned about deletes in this application, so I thought it would be enough to just use automatic timestamps and keep track of the last run time of the program, but then I realized that an uncommitted transaction could come in later with a timestamp from earlier. I haven't seen any way to make the timestamp reflect the actual commit time. I have a few ideas of how to handle this: 1) Put in a 5-minute delay and hope this is long enough for all transactions to commit. This is simple, but the delay is not desirable and there's no guarantee that transactions will all be finished in this time window (although if they didn't it would certainly indicate a problem for this system). 2) Use a boolean flag on every row in every table to indicate if it has been seen yet. This seems like a bad idea, since it would require the batch process to do tons of updates to the source data as it processes the rows. 3) Triggers to fill in a logging table. This seems to be a common approach. I'm not sure what the best way to structure the logging table is, since some of these tables have multi-column primary keys. Suggestions here would be welcome. This one is a lot of work, but sounds pretty safe. Can anyone offer advice or anecdotes about how you have handled similar situations? - Perrin -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]