Re: index consistency strategies

Jonathan Gray Fri, 11 Sep 2009 15:05:32 -0700

In a number of cases, I don't do any insert-time transactions at all andrely on periodic consistency checks. I can deal with stale indexes forshort periods of time without a problem and would rather not pay theupfront cost.

As for updating 1000 data rows and then 1000 index updates, you'd haveto do some testing and also think about your particular case. Are theseindex updates always done in batch? Then I think it makes sense to do1000 data, 1000 index as two bulk HTable.put(List<Put>) calls.

I keep an audit log that if the client doing the index updates has anykind of issue it makes a note... Generally it's silent, but there justto see how things are going. In any case, a synchronization job comesaround to make sure things are in sync when necessary.


Hope that helps.

JG

stack wrote:

Check out transactional hbase under contrib.  It includes an indexed hbase,
a generalized means of keeping up secondary indexes that uses transactional
hbase keeping up primary and the indexed table; i.e. if insert into primary
or index fails, the insert is "rolled back".

St.Ack

On Fri, Sep 11, 2009 at 7:09 AM, Matt Corgan <mcor...@hotpads.com> wrote:

Does anyone have any tips or strategies for keeping an index in sync with
its data?  I'd of course update the index immediately after the data, but
over time there will inevitably be inconsistencies.  Do people just run
periodic clean-up jobs?

On a related note, how important is batching updates from a performance
standpoint?  In MySQL it is significant, but the write path in HBase seems
so fast that it may not matter much except for network latency.  Would you
recommend updating 1000 data rows, then applying the 1000 index updates,
or interleaving the updates row-by-row?

Congrats on the new release!  Looks awesome.
Matt

Re: index consistency strategies

Reply via email to