[jira] [Created] (HADOOP-13134) WASB's file delete still throwing Blob not found exception
Lin Chan created HADOOP-13134: - Summary: WASB's file delete still throwing Blob not found exception Key: HADOOP-13134 URL: https://issues.apache.org/jira/browse/HADOOP-13134 Project: Hadoop Common Issue Type: Bug Components: azure Affects Versions: 2.7.1 Reporter: Lin Chan Assignee: Dushyanth WASB is still throwing blob not found exception as shown in the following stack. Need to catch that and convert to Boolean return code in WASB delete. 16/05/07 01:24:57 ERROR InsertIntoHadoopFsRelation: Aborting job. org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: The specified blob does not exist. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.updateFolderLastModifiedTime(AzureNativeFileSystemStore.java:2682) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.updateFolderLastModifiedTime(AzureNativeFileSystemStore.java:2693) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.updateParentFolderLastModifiedTime(NativeAzureFileSystem.java:2495) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1860) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1603) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1836) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1603) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1836) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1603) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1836) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1603) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1836) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1603) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1836) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1603) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1836) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1603) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:510) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:403) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:364) at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11685) StorageException complaining " no lease ID" during HBase distributed log splitting
[ https://issues.apache.org/jira/browse/HADOOP-11685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967573#comment-14967573 ] Lin Chan commented on HADOOP-11685: --- Patch looks good to me. > StorageException complaining " no lease ID" during HBase distributed log > splitting > -- > > Key: HADOOP-11685 > URL: https://issues.apache.org/jira/browse/HADOOP-11685 > Project: Hadoop Common > Issue Type: Bug > Components: tools >Reporter: Duo Xu >Assignee: Duo Xu > Attachments: HADOOP-11685.01.patch, HADOOP-11685.02.patch, > HADOOP-11685.03.patch > > > This is similar to HADOOP-11523, but in a different place. During HBase > distributed log splitting, multiple threads will access the same folder > called "recovered.edits". However, lots of places in our WASB code did not > acquire lease and simply passed null to Azure storage, which caused this > issue. > {code} > 2015-02-26 03:21:28,871 WARN > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of > WALs/workernode4.hbaseproddm2001.g6.internal.cloudapp.net,60020,1422071058425-splitting/workernode4.hbaseproddm2001.g6.internal.cloudapp.net%2C60020%2C1422071058425.1424914216773 > failed, returning error > java.io.IOException: org.apache.hadoop.fs.azure.AzureException: > java.io.IOException > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.checkForErrors(HLogSplitter.java:633) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.access$000(HLogSplitter.java:121) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWriting(HLogSplitter.java:964) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(HLogSplitter.java:1019) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:359) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:223) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:142) > at > org.apache.hadoop.hbase.regionserver.handler.HLogSplitterHandler.process(HLogSplitterHandler.java:79) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.fs.azure.AzureException: java.io.IOException > at > org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1477) > at > org.apache.hadoop.fs.azurenative.NativeAzureFileSystem.mkdirs(NativeAzureFileSystem.java:1862) > at > org.apache.hadoop.fs.azurenative.NativeAzureFileSystem.mkdirs(NativeAzureFileSystem.java:1812) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getRegionSplitEditsPath(HLogSplitter.java:502) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.createWAP(HLogSplitter.java:1211) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(HLogSplitter.java:1200) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.append(HLogSplitter.java:1243) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:851) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:843) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:813) > Caused by: java.io.IOException > at > com.microsoft.windowsazure.storage.core.Utility.initIOException(Utility.java:493) > at > com.microsoft.windowsazure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:282) > at > org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1472) > ... 10 more > Caused by: com.microsoft.windowsazure.storage.StorageException: There is > currently a lease on the blob and no lease ID was specified in the request. > at > com.microsoft.windowsazure.storage.StorageException.translateException(StorageException.java:163) > at > com.microsoft.windowsazure.storage.core.StorageRequest.materializeException(StorageRequest.java:306) > at > com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:229) > at > com.microsoft.windowsazu
[jira] [Commented] (HADOOP-12334) Change Mode Of Copy Operation of HBase WAL Archiving to bypass Azure Storage Throttling after retries
[ https://issues.apache.org/jira/browse/HADOOP-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745798#comment-14745798 ] Lin Chan commented on HADOOP-12334: --- +1 looks good. > Change Mode Of Copy Operation of HBase WAL Archiving to bypass Azure Storage > Throttling after retries > - > > Key: HADOOP-12334 > URL: https://issues.apache.org/jira/browse/HADOOP-12334 > Project: Hadoop Common > Issue Type: Improvement > Components: tools >Reporter: Gaurav Kanade >Assignee: Gaurav Kanade > Attachments: HADOOP-12334.01.patch, HADOOP-12334.02.patch, > HADOOP-12334.03.patch, HADOOP-12334.04.patch, HADOOP-12334.05.patch > > > HADOOP-11693 mitigated the problem of HMaster aborting regionserver due to > Azure Storage Throttling event during HBase WAL archival. The way this was > achieved was by applying an intensive exponential retry when throttling > occurred. > As a second level of mitigation we will change the mode of copy operation if > the operation fails even after all retries -i.e. we will do a client side > copy of the blob and then copy it back to destination. This operation will > not be subject to throttling and hence should provide a stronger mitigation. > However it is more expensive, hence we do it only in the case we fail after > all retries -- This message was sent by Atlassian JIRA (v6.3.4#6332)