[ https://issues.apache.org/jira/browse/HADOOP-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-17308: ------------------------------------ Summary: WASB : PageBlobOutputStream succeeding hflush even when underlying flush to storage failed (was: WASB : PageBlobOutputStream succeeding flush even when underlying flush to storage failed ) > WASB : PageBlobOutputStream succeeding hflush even when underlying flush to > storage failed > ------------------------------------------------------------------------------------------- > > Key: HADOOP-17308 > URL: https://issues.apache.org/jira/browse/HADOOP-17308 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 2.7.0 > Reporter: Anoop Sam John > Assignee: Anoop Sam John > Priority: Critical > Labels: HBASE, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > In PageBlobOutputStream, write() APIs will fill the buffer and > hflush/hsync/flush call will flush the buffer to underlying storage. Here the > Azure calls are handled in another thread > {code} > private synchronized void flushIOBuffers() { > ... > lastQueuedTask = new WriteRequest(outBuffer.toByteArray()); > ioThreadPool.execute(lastQueuedTask); > .... > } > private class WriteRequest implements Runnable { > private final byte[] dataPayload; > private final CountDownLatch doneSignal = new CountDownLatch(1); > public WriteRequest(byte[] dataPayload) { > this.dataPayload = dataPayload; > } > public void waitTillDone() throws InterruptedException { > doneSignal.await(); > } > @Override > public void run() { > try { > LOG.debug("before runInternal()"); > runInternal(); > LOG.debug("after runInternal()"); > } finally { > doneSignal.countDown(); > } > } > private void runInternal() { > ...... > writePayloadToServer(rawPayload); > ........... > } > private void writePayloadToServer(byte[] rawPayload) { > ...... > try { > blob.uploadPages(wrapperStream, currentBlobOffset, rawPayload.length, > withMD5Checking(), PageBlobOutputStream.this.opContext); > } catch (IOException ex) { > lastError = ex; > } catch (StorageException ex) { > lastError = new IOException(ex); > } > if (lastError != null) { > LOG.debug("Caught error in > PageBlobOutputStream#writePayloadToServer()"); > } > } > } > {code} > The flushing thread will wait for the other thread to complete the Runnable > WriteRequest. Thats fine. But when some exception happened while > blob.uploadPages, we just set that to lastError state variable. This > variable is been checked for all subsequent ops like write, flush etc. But > what about the current flush call? that is silently being succeeded.!! > In standard Azure backed HBase clusters WAL is on page blob. This issue > causes a serious issue in HBase and causes data loss! HBase think a WAL write > was hflushed and make row write successful. In fact the row was never gone to > storage. > Checking the lastError variable at the end of flush op will solve the issue. > Then we will throw IOE from this flush() itself. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org