Thanks a lot for clarifying !
That was very helpful.
Regards,
Mridul
Clint Morgan wrote:
Sorry I have been so slow in understanding. I now see what you mean. I was
trying to explain how I thought it *should* work, rather than what it
actually does now.
That method of aborting on an exception in the 2nd phase is incorrect for
the reason you mention: For a 3 region transaction, we could have committed
the first region, error-ed on the 2nd region, and then aborted the 3rd
region. So even dis-regarding indexes, we would lose our atomic property in
the base table.
Rather we can let just the 2nd region fail with the assumption that it has
all the information that it needs to get that transaction committed when it
recovers from the WAL. So when the 2nd region is finally recovered and ready
to serve again it will have the transaction committed.
For your second point about not aborting in the case of failure in the
regionserver, you also raise a valid point. A failure of the filesystem will
cause an abort, and then initiate the WAL recovery properly. However other
exceptions could sneak through (maybe an OOME failure on the Indexed put
rpc), and cause an inconsistent index and/or some of the trx puts not being
applied.
Rather we should probably be more explicit about handling IOE's in the
transactional layer. The trx region server needs to guarantee that when it
is told to commit a transaction, the writes will eventually occur. It may be
as simple as handling an exception in the commit methods by aborting the
region server, but this seems to fragile.
I've been delaying worrying to much in the details of transactional failure
recovery until we have append and a working write-ahead-log in core hbase.
But its probably about time to revisit...
Thank you very much for digging in here, a second set of eyes is handy.
-clint
On Tue, Jan 19, 2010 at 1:37 AM, Mridul Muralidharan
<[email protected]>wrote:
Clint Morgan wrote:
After the 2PC process has determined that a commit should happen there is
no
roll-back. The commit must be processed.
From org.apache.hadoop.hbase.client.transactional.TransactionManager
doCommit() which is the 2nd phase of 2-phase commit, on throwing Exception
results in abort() which does the rollback.
And this abort specifically ignores the region which hit the error -
thereby making the index go out of sync.
I hope I am not missing something with this assertion, since I had
mentioned this earlier too (possibly got buried in my details ?).
Since abort is resulting in an rpc call, which results in some log
manipulation, I left it at that and did not dig deeper - do you mean it
actually does nothing ?
So in your example, a commit has been approved, and one the of the regions
is told to go ahead and commit. The region triggers the index Put, but
then
fails on his Puts (like out of disk space, out of memory, etc). This
should
shutdown the RegionServer. Then when the region's WAL is recovered from,
the
trx puts from the partially-committed transaction will be there. We will
look in the global transaction log to see that the trx is to be committed,
and then apply the puts to the base table.
I relooked at the implementation just to make sure I got the basic issue
right.
I did not see this behavior you mention above - of IOException resulting in
shutting down of a region server - and quite a lot of methods actually could
result in IOException's getting thrown when traversing the call-graph from
indexedregion.Put's invocation (filesystem going missing is just one case
where this happens I think - but I did not see this as being the only case :
atleast impl/doc wise).
Anyway, to make progress, if commit failure in a indexed regionserver does
a rollback of the txn, then the issue I mentioned can occur ?
Thanks for your patience and time !
Regards,
Mridul
-clint
On Fri, Jan 15, 2010 at 2:43 AM, Mridul Muralidharan
<[email protected]>wrote:
I think I might not have explained it well enough.
As part of executing a Put, the index update happens prior to updating
the
underlying transactional table currently - and is done outside of the
lock's.
If the underlying transactional table update results in an exception -
what
is the state of the index ? From what I understand, a rollback is
initiated
- and this results in rolling back all regions - except for the one which
threw the exception : and so the secondary index update which happened
implicitly is never reverted.
Or am I missing something here ?
To be clear, I am talking about the actual commit as part of the two
phase
commit throwing an exception : not a conflict exception, but an
IOException
or variant - which can result in the secondary index going out of sync.
I am contrasting it with the case of explicit indexes maintained by
client
- where the rollback by client (when the commit fails for a region)
results
in rollback on all the regions in the transaction - which includes the
seconday indexes 'visible' to the client.
Thanks,
Mridul
If the regionserver crashes during this commit process, then I *think*
it
should still recover correctly. It will see the transactional operations
in
the WAL, and the propagate the puts into the index. However this WAL
recovery stuff has been changing, and I'm not confident that it
currently
works in all failure cases.
Does this normal case address your concerns?
-clint
On Sun, Jan 3, 2010 at 4:46 PM, Mridul Muralidharan
<[email protected]>wrote:
stack wrote:
On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
<[email protected]>wrote:
I was wondering about the atomicity guarantees when using secondary
indexes from within a transaction.
You are talking about indexed hbase from transactional hbase
contrib?
Yes, exactly.
From what I could gather, updates to the index table goes through its
own
(set of) rpc before the underlying transactional table is updated -
and
these update happens outside of the locks for the transaction table.
Yes. But IIUC, the client is running a transaction that spans the
update
to
the two tables. It'll take care of the undo should say the update to
the
transacation table fails.
Isn't the update to the secondary index implicitly done ? As in, does
the
client 'see' this update ?
My impression was that the secondary index update was done by the
indexedregion - and was not visible to the client : which manages occ
transaction ...
Also, the index regions need not colocate with the table region.
So essentially wondering
a) if the index can go out of sync with the transactional table ?
It should not. The client should run the undos if the insert does
not
go
into both tables successfully.
b) if there are errors with update to table, are the indexes rolled
back
?
Yes.
c) Whether there can be issues if there are parallel updates invoked
for
the same row - whether index changes end up being inconsistent with
table
data (due to lock not being held while updating index).
This might be possible. There is a lock held on a row. I'm not
sure
if
the
lock is held on transaction table row while the update is being done
to
the
index table.
This is the doc. as it stands on transactional hbase:
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description
Here is the doc. on indexed-transactional hbase:
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description
You've probably tripped over it already but just in case, it might
help.
I did go through the package sumamries, thanks : which is what
increased
my
confusion.
My current understanding is :
a) Client 'simulates' the transaction - by inspecting the state of the
rows
on commit and rolls back in case of conflicting updates.
b) secondary index updates are transparent to client api and are
directly
done by the indexedregion as part of its implementation.
If this is correct, I am wondering if overlapping rollbacks can result
in
secondary index going out of sync with the table since (a) does not see
those (one update gets rolled back while another goes through - or
variations of it).
Thanks,
Mridul
St.Ack
I guess they are all kind of related queries.
I was not able to get a clear picture from the archives, so
RTFM/pointers
would be helpful if this is already answered.
Thanks,
Mridul