In a number of cases, I don't do any insert-time transactions at all and rely on periodic consistency checks. I can deal with stale indexes for short periods of time without a problem and would rather not pay the upfront cost.

As for updating 1000 data rows and then 1000 index updates, you'd have to do some testing and also think about your particular case. Are these index updates always done in batch? Then I think it makes sense to do 1000 data, 1000 index as two bulk HTable.put(List<Put>) calls.

I keep an audit log that if the client doing the index updates has any kind of issue it makes a note... Generally it's silent, but there just to see how things are going. In any case, a synchronization job comes around to make sure things are in sync when necessary.

Hope that helps.

JG

stack wrote:
Check out transactional hbase under contrib.  It includes an indexed hbase,
a generalized means of keeping up secondary indexes that uses transactional
hbase keeping up primary and the indexed table; i.e. if insert into primary
or index fails, the insert is "rolled back".

St.Ack

On Fri, Sep 11, 2009 at 7:09 AM, Matt Corgan <mcor...@hotpads.com> wrote:

Does anyone have any tips or strategies for keeping an index in sync with
its data?  I'd of course update the index immediately after the data, but
over time there will inevitably be inconsistencies.  Do people just run
periodic clean-up jobs?

On a related note, how important is batching updates from a performance
standpoint?  In MySQL it is significant, but the write path in HBase seems
so fast that it may not matter much except for network latency.  Would you
recommend updating 1000 data rows, then applying the 1000 index updates,
or interleaving the updates row-by-row?

Congrats on the new release!  Looks awesome.
Matt


Reply via email to