[
https://issues.apache.org/jira/browse/HBASE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15772011#comment-15772011
]
Yu Li commented on HBASE-17361:
-------------------------------
Thanks for the reference to HTable javadoc [~jerryhe].
It's awkward that BufferedMutator is thread safe but never got chance to be
accessed in parallel....
It's also true that if we share the same BufferedMutator per connection the
locking will be extreme as [~eclark] mentioned in HBASE-14687, see
{{BufferedMutatorImpl#flush}} which is synchronized...
Personally I think we should try to resolve the contention inside
BufferedMutator and then make HTable thread safe. We should resolve the issue
in a way or another rather than leaving it there after all...
I'm working on something interesting and have a pretty good reason of making
HTable thread safe, will share later when it's ready (smile). Before that,
let's just leave the JIRA open here for later check.
> HTable#getBufferedMutator is not thread safe and could cause data loss
> ----------------------------------------------------------------------
>
> Key: HBASE-17361
> URL: https://issues.apache.org/jira/browse/HBASE-17361
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.1.7, 1.2.4
> Reporter: Yu Li
> Assignee: Yu Li
> Priority: Critical
> Attachments: HBASE-17361.patch, HBASE-17361.patch
>
>
> Now we have {{HTable#getBufferedMutator}} like
> {code}
> BufferedMutator getBufferedMutator() throws IOException {
> if (mutator == null) {
> this.mutator = (BufferedMutatorImpl) connection.getBufferedMutator(
> new BufferedMutatorParams(tableName)
> .pool(pool)
> .writeBufferSize(connConfiguration.getWriteBufferSize())
> .maxKeyValueSize(connConfiguration.getMaxKeyValueSize())
> );
> }
> mutator.setRpcTimeout(writeRpcTimeout);
> mutator.setOperationTimeout(operationTimeout);
> return mutator;
> }
> {code}
> And {{HTable#flushCommits}}:
> {code}
> void flushCommits() throws IOException {
> if (mutator == null) {
> // nothing to flush if there's no mutator; don't bother creating one.
> return;
> }
> getBufferedMutator().flush();
> }
> {code}
> For {{HTable#put}}
> {code}
> public void put(final Put put) throws IOException {
> getBufferedMutator().mutate(put);
> flushCommits();
> }
> {code}
> If we launch multiple threads to put in parallel, below sequence might happen
> because {{HTable#getBufferedMutator}} is not thread safe:
> {noformat}
> 1. ThreadA runs to getBufferedMutator and finds mutator==null
> 2. ThreadB runs to getBufferedMutator and finds mutator==null
> 3. ThreadA initialize mutator to instanceA, then calls mutator#mutate,
> adding one put (putA) into {{writeAsyncBuffer}}
> 4. ThreadB initialize mutator to instanceB
> 5. ThreadA runs to flushCommits, now mutator is instanceB, it calls
> instanceB's flush method, putA is lost
> {noformat}
> Will add a UT to cover this case, and fix it in this JIRA.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)