stack wrote:
On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
<[email protected]>wrote:

 I was wondering about the atomicity guarantees when using secondary
indexes from within a transaction.


You are talking about indexed hbase from transactional hbase contrib?


Yes, exactly.



From what I could gather, updates to the index table goes through its own
(set of) rpc before the underlying transactional table is updated - and
these update happens outside of the locks for the transaction table.


Yes.  But IIUC, the client is running a transaction that spans the update to
the two tables.  It'll take care of the undo should say the update to the
transacation table fails.



Isn't the update to the secondary index implicitly done ? As in, does the client 'see' this update ? My impression was that the secondary index update was done by the indexedregion - and was not visible to the client : which manages occ transaction ...




Also, the index regions need not colocate with the table region.

So essentially wondering
a) if the index can go out of sync with the transactional table ?


It should not.  The client should run the undos if the insert does not go
into both tables successfully.



b) if there are errors with update to table, are the indexes rolled back ?


Yes.



c) Whether there can be issues if there are parallel updates invoked for
the same row - whether index changes end up being inconsistent with table
data (due to lock not being held while updating index).



This might be possible.  There is a lock held on a row.  I'm not sure if the
lock is held on transaction table row while the update is being done to the
index table.

This is the doc. as it stands on transactional hbase:
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description

Here is the doc. on indexed-transactional hbase:
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description

You've probably tripped over it already but just in case, it might help.


I did go through the package sumamries, thanks : which is what increased my confusion.

My current understanding is :

a) Client 'simulates' the transaction - by inspecting the state of the rows on commit and rolls back in case of conflicting updates.

b) secondary index updates are transparent to client api and are directly done by the indexedregion as part of its implementation.


If this is correct, I am wondering if overlapping rollbacks can result in secondary index going out of sync with the table since (a) does not see those (one update gets rolled back while another goes through - or variations of it).



Thanks,
Mridul


St.Ack



I guess they are all kind of related queries.


I was not able to get a clear picture from the archives, so RTFM/pointers
would be helpful if this is already answered.

Thanks,
Mridul


Reply via email to