[ 
https://issues.apache.org/jira/browse/HADOOP-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-17308:
------------------------------------
    Summary: WASB : PageBlobOutputStream succeeding hflush even when underlying 
flush to storage failed   (was: WASB : PageBlobOutputStream succeeding flush 
even when underlying flush to storage failed )

> WASB : PageBlobOutputStream succeeding hflush even when underlying flush to 
> storage failed 
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-17308
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17308
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>            Priority: Critical
>              Labels: HBASE, pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In PageBlobOutputStream, write()  APIs will fill the buffer and 
> hflush/hsync/flush call will flush the buffer to underlying storage. Here the 
> Azure calls are handled in another thread 
> {code}
> private synchronized void flushIOBuffers()  {
>     ...
>     lastQueuedTask = new WriteRequest(outBuffer.toByteArray());
>     ioThreadPool.execute(lastQueuedTask);
>   ....
>  }
> private class WriteRequest implements Runnable {
>     private final byte[] dataPayload;
>     private final CountDownLatch doneSignal = new CountDownLatch(1);
>     public WriteRequest(byte[] dataPayload) {
>       this.dataPayload = dataPayload;
>     }
>     public void waitTillDone() throws InterruptedException {
>       doneSignal.await();
>     }
>     @Override
>     public void run() {
>       try {
>         LOG.debug("before runInternal()");
>         runInternal();
>         LOG.debug("after runInternal()");
>       } finally {
>         doneSignal.countDown();
>       }
>     }
>     private void runInternal() {
>       ......
>       writePayloadToServer(rawPayload);
>       ...........
>     }
>     private void writePayloadToServer(byte[] rawPayload) {
>       ......
>       try {
>         blob.uploadPages(wrapperStream, currentBlobOffset, rawPayload.length,
>             withMD5Checking(), PageBlobOutputStream.this.opContext);
>       } catch (IOException ex) {
>         lastError = ex;
>       } catch (StorageException ex) {
>         lastError = new IOException(ex);
>       }
>       if (lastError != null) {
>         LOG.debug("Caught error in 
> PageBlobOutputStream#writePayloadToServer()");
>       }
>     }
>   }
> {code}
> The flushing thread will wait for the other thread to complete the Runnable 
> WriteRequest. Thats fine. But when some exception happened while 
> blob.uploadPages, we just set that to lastError state variable.  This 
> variable is been checked for all subsequent ops like write, flush etc.  But 
> what about the current flush call? that is silently being succeeded.!!  
> In standard Azure backed HBase clusters WAL is on page blob. This issue 
> causes a serious issue in HBase and causes data loss! HBase think a WAL write 
> was hflushed and make row write successful. In fact the row was never gone to 
> storage.
> Checking the lastError variable at the end of flush op will solve the issue. 
> Then we will throw IOE from this flush() itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to