[jira] [Commented] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
[ https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926355#comment-16926355 ] wangcong commented on HDFS-10943: - [~daryn],[~kihwal],[~zhz],[~hexiaoqiao],[~yzhangal] Our log cluster occurs this problem several times. We use individual cluster to write to yarn logs. But this log cluster crash serveral times.In the process of viewing logs,We found the same error as this issue. To diagnosis this problem ,we deploy HDFS-11306 and HDFS-11292. When log cluster crash again , diagnostic log as follows: 2019-09-10 03:50:16,403 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: LastWrittenTxId 5061382841 is expected to be the same as LastSyncedTxId 5061382840 2019-09-10 03:50:16,403 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: The edits buffer is 85 bytes long with 1 unflushed transactions.Below is the list of unflushed transactions 2019-09-10 03:50:16,408 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: Unflushed op [0]: CancelDelegationTokenOp [token=token for yarn: HDFS_DELEGATION_TOKEN owner=yarn/datanod...@domain.com, renewer=yarn, realUser=,issueDate=1567970236988, maxDate=1568575036988, sequenceNumber=621170591, masterKeyId=108, opcode=OP_CANCEL_DELEGATION_TOKEN, txid=5061382841] 2019-09-10 03:50:16,409 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log segment 5060982535, 5061382841 failed for required journal (JournalAndStream(mgr=QJM to [10.0.0.1:8001,10.0.0.2:8001,10.0.0.3:8001],stream=QuorumOutputStream starting at txid 5060982535)) java.io.IOException: FSEditsStream has 85 bytes still to be flushed and cannot be closed After deploying patch,namenode crash occurs twice. The op which cause this problem all is CancelDelegationTokenOp. > rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed > -- > > Key: HDFS-10943 > URL: https://issues.apache.org/jira/browse/HDFS-10943 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Major > > Per the following trace stack: > {code} > FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 10562075963, 10562174157 failed for required journal > (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, > 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid > 10562075963)) > java.io.IOException: FSEditStream has 49708 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1172) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6437) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) > at > org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > 2016-09-23 21:40:59,618 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting > QuorumOutputStream starting at txid 10562075963 >
[jira] [Comment Edited] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
[ https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926355#comment-16926355 ] wangcong edited comment on HDFS-10943 at 9/10/19 6:15 AM: -- [~daryn],[~kihwal],[~zhz],[~hexiaoqiao],[~yzhangal] Our log cluster occurs this problem several times. We use individual cluster to write to yarn logs. But this log cluster crash serveral times.In the process of viewing logs,We found the same error as this issue. To diagnosis this problem ,we deploy HDFS-11306 and HDFS-11292. When log cluster crash again , diagnostic log as follows: 2019-09-10 03:50:16,403 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: LastWrittenTxId 5061382841 is expected to be the same as LastSyncedTxId 5061382840 2019-09-10 03:50:16,403 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: The edits buffer is 85 bytes long with 1 unflushed transactions.Below is the list of unflushed transactions 2019-09-10 03:50:16,408 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: Unflushed op [0]: CancelDelegationTokenOp [token=token for yarn: HDFS_DELEGATION_TOKEN owner=yarn/datanod...@domain.com, renewer=yarn, realUser=,issueDate=1567970236988, maxDate=1568575036988, sequenceNumber=621170591, masterKeyId=108, opcode=OP_CANCEL_DELEGATION_TOKEN, txid=5061382841] 2019-09-10 03:50:16,409 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log segment 5060982535, 5061382841 failed for required journal (JournalAndStream(mgr=QJM to [10.0.0.1:8001,10.0.0.2:8001,10.0.0.3:8001],stream=QuorumOutputStream starting at txid 5060982535)) java.io.IOException: FSEditsStream has 85 bytes still to be flushed and cannot be closed After deploying patch,namenode crash occurs twice. The op which cause this problem all is CancelDelegationTokenOp. was (Author: swingcong): [~daryn],[~kihwal],[~zhz],[~hexiaoqiao],[~yzhangal] Our log cluster occurs this problem several times. We use individual cluster to write to yarn logs. But this log cluster crash serveral times.In the process of viewing logs,We found the same error as this issue. To diagnosis this problem ,we deploy HDFS-11306 and HDFS-11292. When log cluster crash again , diagnostic log as follows: 2019-09-10 03:50:16,403 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: LastWrittenTxId 5061382841 is expected to be the same as LastSyncedTxId 5061382840 2019-09-10 03:50:16,403 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: The edits buffer is 85 bytes long with 1 unflushed transactions.Below is the list of unflushed transactions 2019-09-10 03:50:16,408 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: Unflushed op [0]: CancelDelegationTokenOp [token=token for yarn: HDFS_DELEGATION_TOKEN owner=yarn/datanod...@domain.com, renewer=yarn, realUser=,issueDate=1567970236988, maxDate=1568575036988, sequenceNumber=621170591, masterKeyId=108, opcode=OP_CANCEL_DELEGATION_TOKEN, txid=5061382841] 2019-09-10 03:50:16,409 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log segment 5060982535, 5061382841 failed for required journal (JournalAndStream(mgr=QJM to [10.0.0.1:8001,10.0.0.2:8001,10.0.0.3:8001],stream=QuorumOutputStream starting at txid 5060982535)) java.io.IOException: FSEditsStream has 85 bytes still to be flushed and cannot be closed After deploying patch,namenode crash occurs twice. The op which cause this problem all is CancelDelegationTokenOp. > rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed > -- > > Key: HDFS-10943 > URL: https://issues.apache.org/jira/browse/HDFS-10943 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Major > > Per the following trace stack: > {code} > FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 10562075963, 10562174157 failed for required journal > (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, > 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid > 10562075963)) > java.io.IOException: FSEditStream has 49708 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) >
[jira] [Updated] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14795: --- Attachment: HDFS-14795.006.patch > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, > HDFS-14795.006.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926347#comment-16926347 ] Hadoop QA commented on HDFS-14795: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 8s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 662 unchanged - 0 fixed = 665 total (was 662) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 52s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 26s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}159m 22s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized | | | hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfierWithStripedFile | | | hadoop.hdfs.server.namenode.TestQuotaWithStripedBlocksWithRandomECPolicy | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication | | | hadoop.hdfs.server.namenode.TestNameNodeXAttr | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.server.namenode.TestEditLog | | | hadoop.hdfs.server.namenode.TestCacheDirectives | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier | \\ \\ || Subsystem || Report/
[jira] [Commented] (HDFS-14074) DataNode runs async disk checks maybe throws NullPointerException, and DataNode failed to register to NameSpace.
[ https://issues.apache.org/jira/browse/HDFS-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926346#comment-16926346 ] Zhankun Tang commented on HDFS-14074: - [~jojochuang], cool. Thanks! > DataNode runs async disk checks maybe throws NullPointerException, and > DataNode failed to register to NameSpace. > -- > > Key: HDFS-14074 > URL: https://issues.apache.org/jira/browse/HDFS-14074 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0, 3.0.0 > Environment: hadoop-2.7.3, hadoop-2.8.0 >Reporter: guangyi lu >Assignee: guangyi lu >Priority: Major > Labels: HDFS, HDFS-4 > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14074-latest.patch, HDFS-14074.patch, > WechatIMG83.jpeg > > Original Estimate: 48h > Remaining Estimate: 48h > > In ThrottledAsyncChecker class,it members of the completedChecks is > WeakHashMap, its definition is as follows: > this.completedChecks = new WeakHashMap<>(); > and one of its uses is as follows in schedule method: > if (completedChecks.containsKey(target)) { > // here may be happen garbage collection,and result may be null. > final LastCheckResult result = completedChecks.get(target); > > final long msSinceLastCheck = timer.monotonicNow() - > result.completedAt; > > } > after "completedChecks.containsKey(target)", may be happen garbage > collection, and result may be null. > the solution is: > this.completedChecks = new ReferenceMap(1, 1); > or > this.completedChecks = new HashMap<>(); > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14074) DataNode runs async disk checks maybe throws NullPointerException, and DataNode failed to register to NameSpace.
[ https://issues.apache.org/jira/browse/HDFS-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926344#comment-16926344 ] Wei-Chiu Chuang commented on HDFS-14074: Removed the incompatible change flag. This is a harmless fix. Thanks > DataNode runs async disk checks maybe throws NullPointerException, and > DataNode failed to register to NameSpace. > -- > > Key: HDFS-14074 > URL: https://issues.apache.org/jira/browse/HDFS-14074 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0, 3.0.0 > Environment: hadoop-2.7.3, hadoop-2.8.0 >Reporter: guangyi lu >Assignee: guangyi lu >Priority: Major > Labels: HDFS, HDFS-4 > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14074-latest.patch, HDFS-14074.patch, > WechatIMG83.jpeg > > Original Estimate: 48h > Remaining Estimate: 48h > > In ThrottledAsyncChecker class,it members of the completedChecks is > WeakHashMap, its definition is as follows: > this.completedChecks = new WeakHashMap<>(); > and one of its uses is as follows in schedule method: > if (completedChecks.containsKey(target)) { > // here may be happen garbage collection,and result may be null. > final LastCheckResult result = completedChecks.get(target); > > final long msSinceLastCheck = timer.monotonicNow() - > result.completedAt; > > } > after "completedChecks.containsKey(target)", may be happen garbage > collection, and result may be null. > the solution is: > this.completedChecks = new ReferenceMap(1, 1); > or > this.completedChecks = new HashMap<>(); > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14074) DataNode runs async disk checks maybe throws NullPointerException, and DataNode failed to register to NameSpace.
[ https://issues.apache.org/jira/browse/HDFS-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14074: --- Hadoop Flags: Reviewed (was: Incompatible change,Reviewed) > DataNode runs async disk checks maybe throws NullPointerException, and > DataNode failed to register to NameSpace. > -- > > Key: HDFS-14074 > URL: https://issues.apache.org/jira/browse/HDFS-14074 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0, 3.0.0 > Environment: hadoop-2.7.3, hadoop-2.8.0 >Reporter: guangyi lu >Assignee: guangyi lu >Priority: Major > Labels: HDFS, HDFS-4 > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14074-latest.patch, HDFS-14074.patch, > WechatIMG83.jpeg > > Original Estimate: 48h > Remaining Estimate: 48h > > In ThrottledAsyncChecker class,it members of the completedChecks is > WeakHashMap, its definition is as follows: > this.completedChecks = new WeakHashMap<>(); > and one of its uses is as follows in schedule method: > if (completedChecks.containsKey(target)) { > // here may be happen garbage collection,and result may be null. > final LastCheckResult result = completedChecks.get(target); > > final long msSinceLastCheck = timer.monotonicNow() - > result.completedAt; > > } > after "completedChecks.containsKey(target)", may be happen garbage > collection, and result may be null. > the solution is: > this.completedChecks = new ReferenceMap(1, 1); > or > this.completedChecks = new HashMap<>(); > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14074) DataNode runs async disk checks maybe throws NullPointerException, and DataNode failed to register to NameSpace.
[ https://issues.apache.org/jira/browse/HDFS-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926343#comment-16926343 ] Zhankun Tang commented on HDFS-14074: - [~jojochuang], [~luguangyi], [~arp], Could you please update the release note? this is a blocker for the 3.1.3 release too. Thanks a lot. > DataNode runs async disk checks maybe throws NullPointerException, and > DataNode failed to register to NameSpace. > -- > > Key: HDFS-14074 > URL: https://issues.apache.org/jira/browse/HDFS-14074 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0, 3.0.0 > Environment: hadoop-2.7.3, hadoop-2.8.0 >Reporter: guangyi lu >Assignee: guangyi lu >Priority: Major > Labels: HDFS, HDFS-4 > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14074-latest.patch, HDFS-14074.patch, > WechatIMG83.jpeg > > Original Estimate: 48h > Remaining Estimate: 48h > > In ThrottledAsyncChecker class,it members of the completedChecks is > WeakHashMap, its definition is as follows: > this.completedChecks = new WeakHashMap<>(); > and one of its uses is as follows in schedule method: > if (completedChecks.containsKey(target)) { > // here may be happen garbage collection,and result may be null. > final LastCheckResult result = completedChecks.get(target); > > final long msSinceLastCheck = timer.monotonicNow() - > result.completedAt; > > } > after "completedChecks.containsKey(target)", may be happen garbage > collection, and result may be null. > the solution is: > this.completedChecks = new ReferenceMap(1, 1); > or > this.completedChecks = new HashMap<>(); > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926174#comment-16926174 ] Siddharth Wagle edited comment on HDDS-1868 at 9/10/19 4:34 AM: [~ljain] you are very correct, my UT did catch it. Could you please review version 02? Thanks. The UT does not check negative scenario where no leader means no report, so I will change the name if you think code changes looks good. was (Author: swagle): [~ljain] you are very correct, my UT did catch it. Could you please review version 02? Thanks. The UT does not check negative scenario where no leader means no report, so I will change the name if you think change looks good. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926174#comment-16926174 ] Siddharth Wagle edited comment on HDDS-1868 at 9/10/19 4:33 AM: [~ljain] you are very correct, my UT did catch it. Could you please review version 02? Thanks. The UT does not check negative scenario where no leader means no report, so I will change the name if you think change looks good. was (Author: swagle): [~ljain] you are very correct, my UT did catch it. Could you please review version 02? Thanks. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric
[ https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926308#comment-16926308 ] Wei-Chiu Chuang commented on HDFS-14836: Got it. Thanks. That makes sense to me. > FileIoProvider should not increase FileIoErrors metric in datanode volume > metric > > > Key: HDFS-14836 > URL: https://issues.apache.org/jira/browse/HDFS-14836 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Aiphago >Assignee: Aiphago >Priority: Minor > > I found that FileIoErrors metric will increase in > BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But > in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been > ignore like "Broken pipe" and "Connection reset" . > So should do a filter when fileIoProvider increase FileIoErrors count ? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2089) Add CLI createPipeline
[ https://issues.apache.org/jira/browse/HDDS-2089?focusedWorklogId=309512&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309512 ] ASF GitHub Bot logged work on HDDS-2089: Author: ASF GitHub Bot Created on: 10/Sep/19 03:43 Start Date: 10/Sep/19 03:43 Worklog Time Spent: 10m Work Description: timmylicheng commented on pull request #1418: HDDS-2089: Add createPipeline CLI. URL: https://github.com/apache/hadoop/pull/1418 #HDDS-2089 Add createPipeline for ozone scmcli This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309512) Remaining Estimate: 0h Time Spent: 10m > Add CLI createPipeline > -- > > Key: HDDS-2089 > URL: https://issues.apache.org/jira/browse/HDDS-2089 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone CLI >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add a SCMCLI to create pipeline for ozone. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2089) Add CLI createPipeline
[ https://issues.apache.org/jira/browse/HDDS-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2089: - Labels: pull-request-available (was: ) > Add CLI createPipeline > -- > > Key: HDDS-2089 > URL: https://issues.apache.org/jira/browse/HDDS-2089 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone CLI >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > Add a SCMCLI to create pipeline for ozone. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
[ https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926304#comment-16926304 ] Lisheng Sun edited comment on HDFS-14820 at 9/10/19 3:38 AM: - hi [~elgoiri] {quote}What is the current default value? 8KB? {quote} as follow code, current default value is 8KB. {code:java} final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( peer.getOutputStream())); public BufferedOutputStream(OutputStream out) { this(out, 8192); } {code} i have updated buffer is 512B, taken a lot tests and the resut is ok. I can do the pressure test and use the new buffer in our prodution environment later. i agree your suggestion,we can first make it configurable and make the default the old value. Adjust the buffer according to user need. was (Author: leosun08): hi [~elgoiri] {quote}What is the current default value? 8KB? {quote} as follow code, current default value is 8KB. {code:java} final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( peer.getOutputStream())); public BufferedOutputStream(OutputStream out) { this(out, 8192); } {code} i have updated buffer is 512B, taken a test and the resut is ok. i agree your suggestion,make it configurable and make the default the old value. > The default 8KB buffer of > BlockReaderRemote#newBlockReader#BufferedOutputStream is too big > --- > > Key: HDFS-14820 > URL: https://issues.apache.org/jira/browse/HDFS-14820 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14820.001.patch > > > this issue is similar to HDFS-14535. > {code:java} > public static BlockReader newBlockReader(String file, > ExtendedBlock block, > Token blockToken, > long startOffset, long len, > boolean verifyChecksum, > String clientName, > Peer peer, DatanodeID datanodeID, > PeerCache peerCache, > CachingStrategy cachingStrategy, > int networkDistance) throws IOException { > // in and out will be closed when sock is closed (by the caller) > final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( > peer.getOutputStream())); > new Sender(out).readBlock(block, blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy); > } > public BufferedOutputStream(OutputStream out) { > this(out, 8192); > } > {code} > Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy) could not use such a big buffer. > So i think it should reduce BufferedOutputStream buffer. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
[ https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926304#comment-16926304 ] Lisheng Sun commented on HDFS-14820: hi [~elgoiri] {quote}What is the current default value? 8KB? {quote} as follow code, current default value is 8KB. {code:java} final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( peer.getOutputStream())); public BufferedOutputStream(OutputStream out) { this(out, 8192); } {code} i have updated buffer is 512B, taken a test and the resut is ok. i agree your suggestion,make it configurable and make the default the old value. > The default 8KB buffer of > BlockReaderRemote#newBlockReader#BufferedOutputStream is too big > --- > > Key: HDFS-14820 > URL: https://issues.apache.org/jira/browse/HDFS-14820 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14820.001.patch > > > this issue is similar to HDFS-14535. > {code:java} > public static BlockReader newBlockReader(String file, > ExtendedBlock block, > Token blockToken, > long startOffset, long len, > boolean verifyChecksum, > String clientName, > Peer peer, DatanodeID datanodeID, > PeerCache peerCache, > CachingStrategy cachingStrategy, > int networkDistance) throws IOException { > // in and out will be closed when sock is closed (by the caller) > final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( > peer.getOutputStream())); > new Sender(out).readBlock(block, blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy); > } > public BufferedOutputStream(OutputStream out) { > this(out, 8192); > } > {code} > Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy) could not use such a big buffer. > So i think it should reduce BufferedOutputStream buffer. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14837) Review of Block.java
[ https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926301#comment-16926301 ] stack commented on HDFS-14837: -- One question, is Long.hashCode same as (int)(blockId^(blockId>>>32)) (I've not looked..) > Review of Block.java > > > Key: HDFS-14837 > URL: https://issues.apache.org/jira/browse/HDFS-14837 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14837.1.patch > > > The {{Block}} class is such a core class in the project, I just wanted to > make sure it was super clean and documentation was correct. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric
[ https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926300#comment-16926300 ] Aiphago commented on HDFS-14836: Hi [~jojochuang] thxs for your attention. like HDFS-2054 "Broken pipe" and "Connection reset" is cause by client rather than datanode , and datanode may increment a lot of FileIoErrors counter ,because of this Exception. So I think it's better to do a filter. > FileIoProvider should not increase FileIoErrors metric in datanode volume > metric > > > Key: HDFS-14836 > URL: https://issues.apache.org/jira/browse/HDFS-14836 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Aiphago >Assignee: Aiphago >Priority: Minor > > I found that FileIoErrors metric will increase in > BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But > in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been > ignore like "Broken pipe" and "Connection reset" . > So should do a filter when fileIoProvider increase FileIoErrors count ? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica
[ https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926296#comment-16926296 ] Lisheng Sun commented on HDFS-14283: [~smeng] i are working on this jira. upload this patch later. Thank you. > DFSInputStream to prefer cached replica > --- > > Key: HDFS-14283 > URL: https://issues.apache.org/jira/browse/HDFS-14283 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 > Environment: HDFS Caching >Reporter: Wei-Chiu Chuang >Assignee: Lisheng Sun >Priority: Major > > HDFS Caching offers performance benefits. However, currently NameNode does > not treat cached replica with higher priority, so HDFS caching is only useful > when cache replication = 3, that is to say, all replicas are cached in > memory, so that a client doesn't randomly pick an uncached replica. > HDFS-6846 proposed to let NameNode give higher priority to cached replica. > Changing a logic in NameNode is always tricky so that didn't get much > traction. Here I propose a different approach: let client (DFSInputStream) > prefer cached replica. > A {{LocatedBlock}} object already contains cached replica location so a > client has the needed information. I think we can change > {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14795: --- Attachment: HDFS-14795.005.patch > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14568) setStoragePolicy should check quota and update consume on storage type quota.
[ https://issues.apache.org/jira/browse/HDFS-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926292#comment-16926292 ] Jinglun commented on HDFS-14568: Hi [~surendrasingh], sorry for my late response. Do you mean to set SSD storage quota to 10byte on a directory with 10GB DISK space consume? I think we shouldn't allow this because the setStoragePolicy() will cause a quota exceed. And I think any rpc causing quota exceed should end with a QuotaExceed exception. In patch-004 a RemoteException with QuotaByStorageTypeExceededException will be thrown. +1 the change would be incompatible because the method only throws IOException but now it will throw a QuotaExceedException. May be adding a switch to enable the quota check & consume update ? > setStoragePolicy should check quota and update consume on storage type quota. > - > > Key: HDFS-14568 > URL: https://issues.apache.org/jira/browse/HDFS-14568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.1.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-14568-001.patch, HDFS-14568-unit-test.patch, > HDFS-14568.002.patch, HDFS-14568.003.patch, HDFS-14568.004.patch > > > The quota and consume of the file's ancestors are not handled when the > storage policy of the file is changed. For example: > 1. Set quota StorageType.SSD fileSpace-1 to the parent dir; > 2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} > under it; > 3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and > expect a QuotaByStorageTypeExceededException. > Because the quota and consume is not handled, the expected exception is not > threw out. > > There are 3 reasons why we should handle the consume and the quota. > 1. Replication uses the new storage policy. Considering a file with BlockType > CONTIGUOUS. It's replication factor is 3 and it's storage policy is "HOT". > Now we change the policy to "ONE_SSD". If a DN goes down and the file needs > replication, the NN will choose storages in policy "ONE_SSD" and replicate > the block to a SSD storage. > 2. We acturally have a cluster storaging both HOT and COLD data. We have a > backgroud process searching all the files to find those that are not accessed > for a period of time. Then we set them to COLD and start a mover to move the > replicas. After moving, all the replicas are consistent with the storage > policy. > 3. The NameNode manages the global state of the cluster. If there is any > inconsistent situation, such as the replicas doesn't match the storage policy > of the file, we should take the NameNode as the standard and make the cluster > to match the NameNode. The block replication is a good example of the rule. > When we count the consume of a file(CONTIGUOUS), we multiply the replication > factor with the file's length, no matter the file is under replicated or > excessed. So does the storage type quota and consume. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926246#comment-16926246 ] Hadoop QA commented on HDDS-1868: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 7m 55s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 26s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 7m 1s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 27s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-ozone: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 2s{color} | {color:green} hadoop-hdds in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 41m 31s{color} | {color:red} hadoop-ozone in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 53s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}152m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ozone.container.server.TestSecureContainerServer | | | hadoop.ozone.client.rpc.TestBlockOutputStream | | | hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion | | | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient | | | hadoop.ozone.container.TestContainerReplication | | | hadoop.hdds.scm.safemode.TestSCMSafeModeWithPipelineRules | | | hadoop.ozone.container.ozoneimpl.TestOzoneContainer | |
[jira] [Commented] (HDFS-14802) The feature of protect directories should be used in RenameOp
[ https://issues.apache.org/jira/browse/HDFS-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926225#comment-16926225 ] Fei Hui commented on HDFS-14802: [~jojochuang] [~arp] Does is make sense? > The feature of protect directories should be used in RenameOp > - > > Key: HDFS-14802 > URL: https://issues.apache.org/jira/browse/HDFS-14802 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.4, 3.3.0, 3.2.1, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14802.001.patch, HDFS-14802.002.patch, > HDFS-14802.003.patch > > > Now we could set fs.protected.directories to prevent users from deleting > important directories. But users can delete directories around the limitation. > 1. Rename the directories and delete them. > 2. move the directories to trash and namenode will delete them. > So I think we should use the feature of protected directories in RenameOp -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable
[ https://issues.apache.org/jira/browse/HDFS-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926223#comment-16926223 ] Fei Hui edited comment on HDFS-14831 at 9/10/19 1:08 AM: - [~jojochuang] Get it Users can upgrade to 3.2.0 from 2.7.0~2.7.7 and downgrade to 2.7.8 from 3.2.0. We can work around the compatible stringtable problem, is it right? was (Author: ferhui): [~jojochuang] Get it Users can upgrade to 3.2.0 from 2.7.0~2.7.7 and downgrade to 2.7.8 from 3.2.0. We can work around in compatible stringtable problem, is it right? > Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable > --- > > Key: HDFS-14831 > URL: https://issues.apache.org/jira/browse/HDFS-14831 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.0, 3.3.0, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > > Mentioned on HDFS-13596 > Incompatible StringTable changes cause downgrade from 3.2.0 to 2.7.2 failed > commit message as follow, but issue not found > {quote} > commit 8a41edb089fbdedc5e7d9a2aeec63d126afea49f > Author: Vinayakumar B > Date: Mon Oct 15 15:48:26 2018 +0530 > Fix potential FSImage corruption. Contributed by Daryn Sharp. > {quote} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable
[ https://issues.apache.org/jira/browse/HDFS-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926223#comment-16926223 ] Fei Hui edited comment on HDFS-14831 at 9/10/19 1:07 AM: - [~jojochuang] Get it Users can upgrade to 3.2.0 from 2.7.0~2.7.7 and downgrade to 2.7.8 from 3.2.0. We can work around in compatible stringtable problem, is it right? was (Author: ferhui): [~jojochuang] Get it Users can upgrade to 3.2.0 from 2.7.0~2.7.7 and downgrade to 2.7.8 from 3.2.0. We can work around in compatible stringtable problem, is it? > Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable > --- > > Key: HDFS-14831 > URL: https://issues.apache.org/jira/browse/HDFS-14831 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.0, 3.3.0, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > > Mentioned on HDFS-13596 > Incompatible StringTable changes cause downgrade from 3.2.0 to 2.7.2 failed > commit message as follow, but issue not found > {quote} > commit 8a41edb089fbdedc5e7d9a2aeec63d126afea49f > Author: Vinayakumar B > Date: Mon Oct 15 15:48:26 2018 +0530 > Fix potential FSImage corruption. Contributed by Daryn Sharp. > {quote} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable
[ https://issues.apache.org/jira/browse/HDFS-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926223#comment-16926223 ] Fei Hui commented on HDFS-14831: [~jojochuang] Get it Users can upgrade to 3.2.0 from 2.7.0~2.7.7 and downgrade to 2.7.8 from 3.2.0. We can work around in compatible stringtable problem, is it? > Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable > --- > > Key: HDFS-14831 > URL: https://issues.apache.org/jira/browse/HDFS-14831 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.0, 3.3.0, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > > Mentioned on HDFS-13596 > Incompatible StringTable changes cause downgrade from 3.2.0 to 2.7.2 failed > commit message as follow, but issue not found > {quote} > commit 8a41edb089fbdedc5e7d9a2aeec63d126afea49f > Author: Vinayakumar B > Date: Mon Oct 15 15:48:26 2018 +0530 > Fix potential FSImage corruption. Contributed by Daryn Sharp. > {quote} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14837) Review of Block.java
[ https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926218#comment-16926218 ] Hadoop QA commented on HDFS-14837: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 0 new + 9 unchanged - 1 fixed = 9 total (was 10) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 15s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14837 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979905/HDFS-14837.1.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 63d30f32bdb3 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 650c4ce | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27826/testReport/ | | Max. process+thread count | 341 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27826/console | | Powered by | Apache Yetus 0.8.0 http
[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926217#comment-16926217 ] Fei Hui commented on HDFS-14509: [~John Smith] During Rolling upgrade, NN is 3.x, and DN is 2.x , What is your client version? > DN throws InvalidToken due to inequality of password when upgrade NN 2.x to > 3.x > --- > > Key: HDFS-14509 > URL: https://issues.apache.org/jira/browse/HDFS-14509 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuxuan Wang >Priority: Blocker > Attachments: HDFS-14509-001.patch > > > According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need > upgrade NN first. And there will be a intermediate state that NN is 3.x and > DN is 2.x. At that moment, if a client reads (or writes) a block, it will get > a block token from NN and then deliver the token to DN who can verify the > token. But the verification in the code now is : > {code:title=BlockTokenSecretManager.java|borderStyle=solid} > public void checkAccess(...) > { > ... > id.readFields(new DataInputStream(new > ByteArrayInputStream(token.getIdentifier(; > ... > if (!Arrays.equals(retrievePassword(id), token.getPassword())) { > throw new InvalidToken("Block token with " + id.toString() > + " doesn't have the correct token password"); > } > } > {code} > And {{retrievePassword(id)}} is: > {code} > public byte[] retrievePassword(BlockTokenIdentifier identifier) > { > ... > return createPassword(identifier.getBytes(), key.getKey()); > } > {code} > So, if NN's identifier add new fields, DN will lose the fields and compute > wrong password. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots
[ https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=309423&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309423 ] ASF GitHub Bot logged work on HDDS-1786: Author: ASF GitHub Bot Created on: 10/Sep/19 00:59 Start Date: 10/Sep/19 00:59 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1163: HDDS-1786 : Datanodes takeSnapshot should delete previously created s… URL: https://github.com/apache/hadoop/pull/1163#issuecomment-529723367 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 82 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 637 | trunk passed | | +1 | compile | 372 | trunk passed | | +1 | checkstyle | 76 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 954 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 170 | trunk passed | | 0 | spotbugs | 436 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 638 | trunk passed | | -0 | patch | 479 | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | ||| _ Patch Compile Tests _ | | +1 | mvninstall | 547 | the patch passed | | +1 | compile | 377 | the patch passed | | +1 | javac | 377 | the patch passed | | +1 | checkstyle | 79 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 756 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 191 | the patch passed | | +1 | findbugs | 657 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 314 | hadoop-hdds in the patch passed. | | -1 | unit | 2269 | hadoop-ozone in the patch failed. | | +1 | asflicense | 46 | The patch does not generate ASF License warnings. | | | | 8333 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.client.rpc.TestBlockOutputStream | | | hadoop.ozone.client.rpc.TestCommitWatcher | | | hadoop.ozone.container.TestContainerReplication | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion | | | hadoop.ozone.TestSecureOzoneCluster | | | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.2 Server=19.03.2 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/7/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1163 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 06da123aa662 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 650c4ce | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/7/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/7/testReport/ | | Max. process+thread count | 4692 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/container-service U: hadoop-hdds/container-service | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/7/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309423) Time Spent: 3h (was: 2h 50m) > Datanodes takeSnapshot should delete previously created snapshots > - > > K
[jira] [Commented] (HDFS-14837) Review of Block.java
[ https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926214#comment-16926214 ] stack commented on HDFS-14837: -- +1 nice cleanup > Review of Block.java > > > Key: HDFS-14837 > URL: https://issues.apache.org/jira/browse/HDFS-14837 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14837.1.patch > > > The {{Block}} class is such a core class in the project, I just wanted to > make sure it was super clean and documentation was correct. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots
[ https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=309419&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309419 ] ASF GitHub Bot logged work on HDDS-1786: Author: ASF GitHub Bot Created on: 10/Sep/19 00:49 Start Date: 10/Sep/19 00:49 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1163: HDDS-1786 : Datanodes takeSnapshot should delete previously created s… URL: https://github.com/apache/hadoop/pull/1163#issuecomment-529721393 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 46 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 623 | trunk passed | | +1 | compile | 392 | trunk passed | | +1 | checkstyle | 80 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 854 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 174 | trunk passed | | 0 | spotbugs | 443 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 649 | trunk passed | | -0 | patch | 491 | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | ||| _ Patch Compile Tests _ | | +1 | mvninstall | 552 | the patch passed | | +1 | compile | 396 | the patch passed | | +1 | javac | 396 | the patch passed | | +1 | checkstyle | 85 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 654 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 175 | the patch passed | | +1 | findbugs | 653 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 295 | hadoop-hdds in the patch passed. | | -1 | unit | 1996 | hadoop-ozone in the patch failed. | | +1 | asflicense | 52 | The patch does not generate ASF License warnings. | | | | 7864 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider | | | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures | | | hadoop.ozone.TestSecureOzoneCluster | | | hadoop.ozone.client.rpc.TestContainerStateMachineFailures | | | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient | | | hadoop.ozone.scm.TestContainerSmallFile | | | hadoop.ozone.container.TestContainerReplication | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1163 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d74699a82149 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 650c4ce | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/6/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/6/testReport/ | | Max. process+thread count | 5268 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/container-service U: hadoop-hdds/container-service | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/6/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309419) Time Spent: 2h 50m (was: 2h 40m) > Datanodes takeSnapshot should delete previously created snapshots > - > > Key: HDDS-178
[jira] [Updated] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
[ https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDDS-2106: --- Priority: Blocker (was: Major) > Avoid usage of hadoop projects as parent of hdds/ozone > -- > > Key: HDDS-2106 > URL: https://issues.apache.org/jira/browse/HDDS-2106 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Priority: Blocker > > Ozone uses hadoop as a dependency. The dependency defined on multiple level: > 1. the hadoop artifacts are defined in the sections > 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the > parent > As we already have a slightly different assembly process it could be more > resilient to use a dedicated parent project instead of the hadoop one. With > this approach it will be easier to upgrade the versions as we don't need to > be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
[ https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton reassigned HDDS-2106: -- Assignee: Elek, Marton > Avoid usage of hadoop projects as parent of hdds/ozone > -- > > Key: HDDS-2106 > URL: https://issues.apache.org/jira/browse/HDDS-2106 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > > Ozone uses hadoop as a dependency. The dependency defined on multiple level: > 1. the hadoop artifacts are defined in the sections > 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the > parent > As we already have a slightly different assembly process it could be more > resilient to use a dedicated parent project instead of the hadoop one. With > this approach it will be easier to upgrade the versions as we don't need to > be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14837) Review of Block.java
[ https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14837: -- Status: Patch Available (was: Open) > Review of Block.java > > > Key: HDFS-14837 > URL: https://issues.apache.org/jira/browse/HDFS-14837 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14837.1.patch > > > The {{Block}} class is such a core class in the project, I just wanted to > make sure it was super clean and documentation was correct. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14837) Review of Block.java
[ https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14837: -- Attachment: HDFS-14837.1.patch > Review of Block.java > > > Key: HDFS-14837 > URL: https://issues.apache.org/jira/browse/HDFS-14837 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14837.1.patch > > > The {{Block}} class is such a core class in the project, I just wanted to > make sure it was super clean and documentation was correct. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14837) Review of Block.java
David Mollitor created HDFS-14837: - Summary: Review of Block.java Key: HDFS-14837 URL: https://issues.apache.org/jira/browse/HDFS-14837 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor The {{Block}} class is such a core class in the project, I just wanted to make sure it was super clean and documentation was correct. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle updated HDDS-1868: -- Attachment: HDDS-1868.02.patch > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle updated HDDS-1868: -- Status: Patch Available (was: In Progress) [~ljain] you are very correct, my UT did catch it. Could you please review version 02? Thanks. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14774) RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling
[ https://issues.apache.org/jira/browse/HDFS-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-14774. Resolution: Not A Problem Thanks CR. I'm resolving it. > RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling > - > > Key: HDFS-14774 > URL: https://issues.apache.org/jira/browse/HDFS-14774 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Minor > > HDFS-13972 added the following code: > {code} > try { > dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE); > } catch (IOException e) { > LOG.error("Cannot get the datanodes from the RPC server", e); > } finally { > // Reset ugi to remote user for remaining operations. > RouterRpcServer.resetCurrentUser(); > } > HashSet excludes = new HashSet(); > if (excludeDatanodes != null) { > Collection collection = > getTrimmedStringCollection(excludeDatanodes); > for (DatanodeInfo dn : dns) { > if (collection.contains(dn.getName())) { > excludes.add(dn); > } > } > } > {code} > If {{rpcServer.getDatanodeReport()}} throws an exception, {{dns}} will become > null. This does't look like the best way to handle the exception. Should > router retry upon exception? Does it perform retry automatically under the > hood? > [~crh] [~brahmareddy] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2053) Fix TestOzoneManagerRatisServer failure
[ https://issues.apache.org/jira/browse/HDDS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926168#comment-16926168 ] Hudson commented on HDDS-2053: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17265 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17265/]) HDDS-2053. Fix TestOzoneManagerRatisServer failure. Contributed by (github: rev 650c4cead5d5465921a8bbd4d6294f515f958169) * (edit) hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/ratis/TestOzoneManagerRatisServer.java > Fix TestOzoneManagerRatisServer failure > --- > > Key: HDDS-2053 > URL: https://issues.apache.org/jira/browse/HDDS-2053 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > {{TestOzoneManagerRatisServer}} is failing on trunk with the following error > {noformat} > [ERROR] > verifyRaftGroupIdGenerationWithCustomOmServiceId(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer) > Time elapsed: 0.418 s <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > OzoneManagerDoubleBufferMetrics already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.ozone.om.ratis.metrics.OzoneManagerDoubleBufferMetrics.create(OzoneManagerDoubleBufferMetrics.java:50) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:110) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:88) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.(OzoneManagerStateMachine.java:87) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.getStateMachine(OzoneManagerRatisServer.java:314) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.(OzoneManagerRatisServer.java:244) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.newOMRatisServer(OzoneManagerRatisServer.java:302) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer.verifyRaftGroupIdGenerationWithCustomOmServiceId(TestOzoneManagerRatisServer.java:209) > ... > {noformat} > (Thanks [~nandakumar131] for the stack trace.) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2053) Fix TestOzoneManagerRatisServer failure
[ https://issues.apache.org/jira/browse/HDDS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDDS-2053: - Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks all for the reviews. I've merged the change to trunk. > Fix TestOzoneManagerRatisServer failure > --- > > Key: HDDS-2053 > URL: https://issues.apache.org/jira/browse/HDDS-2053 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > {{TestOzoneManagerRatisServer}} is failing on trunk with the following error > {noformat} > [ERROR] > verifyRaftGroupIdGenerationWithCustomOmServiceId(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer) > Time elapsed: 0.418 s <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > OzoneManagerDoubleBufferMetrics already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.ozone.om.ratis.metrics.OzoneManagerDoubleBufferMetrics.create(OzoneManagerDoubleBufferMetrics.java:50) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:110) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:88) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.(OzoneManagerStateMachine.java:87) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.getStateMachine(OzoneManagerRatisServer.java:314) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.(OzoneManagerRatisServer.java:244) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.newOMRatisServer(OzoneManagerRatisServer.java:302) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer.verifyRaftGroupIdGenerationWithCustomOmServiceId(TestOzoneManagerRatisServer.java:209) > ... > {noformat} > (Thanks [~nandakumar131] for the stack trace.) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2053) Fix TestOzoneManagerRatisServer failure
[ https://issues.apache.org/jira/browse/HDDS-2053?focusedWorklogId=309330&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309330 ] ASF GitHub Bot logged work on HDDS-2053: Author: ASF GitHub Bot Created on: 09/Sep/19 22:38 Start Date: 09/Sep/19 22:38 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #1373: HDDS-2053. Fix TestOzoneManagerRatisServer failure. Contributed by Xi… URL: https://github.com/apache/hadoop/pull/1373 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309330) Time Spent: 2h 50m (was: 2h 40m) > Fix TestOzoneManagerRatisServer failure > --- > > Key: HDDS-2053 > URL: https://issues.apache.org/jira/browse/HDDS-2053 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > {{TestOzoneManagerRatisServer}} is failing on trunk with the following error > {noformat} > [ERROR] > verifyRaftGroupIdGenerationWithCustomOmServiceId(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer) > Time elapsed: 0.418 s <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > OzoneManagerDoubleBufferMetrics already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.ozone.om.ratis.metrics.OzoneManagerDoubleBufferMetrics.create(OzoneManagerDoubleBufferMetrics.java:50) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:110) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:88) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.(OzoneManagerStateMachine.java:87) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.getStateMachine(OzoneManagerRatisServer.java:314) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.(OzoneManagerRatisServer.java:244) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.newOMRatisServer(OzoneManagerRatisServer.java:302) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer.verifyRaftGroupIdGenerationWithCustomOmServiceId(TestOzoneManagerRatisServer.java:209) > ... > {noformat} > (Thanks [~nandakumar131] for the stack trace.) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
Elek, Marton created HDDS-2106: -- Summary: Avoid usage of hadoop projects as parent of hdds/ozone Key: HDDS-2106 URL: https://issues.apache.org/jira/browse/HDDS-2106 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Ozone uses hadoop as a dependency. The dependency defined on multiple level: 1. the hadoop artifacts are defined in the sections 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the parent As we already have a slightly different assembly process it could be more resilient to use a dedicated parent project instead of the hadoop one. With this approach it will be easier to upgrade the versions as we don't need to be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional
[ https://issues.apache.org/jira/browse/HDDS-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926148#comment-16926148 ] Hudson commented on HDDS-2102: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17264 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17264/]) HDDS-2102. HddsVolumeChecker should use java optional in place of Guava (bharat: rev d69b811ddd8bf2632faabf1e069883b8aa08f5a0) * (edit) hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/volume/TestHddsVolumeChecker.java * (add) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/AsyncChecker.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolumeChecker.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/ThrottledAsyncChecker.java > HddsVolumeChecker should use java optional in place of Guava optional > - > > Key: HDDS-2102 > URL: https://issues.apache.org/jira/browse/HDDS-2102 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > HddsVolumeChecker should use java optional in place of Guava optional, as the > Guava dependency is marked unstable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2053) Fix TestOzoneManagerRatisServer failure
[ https://issues.apache.org/jira/browse/HDDS-2053?focusedWorklogId=309313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309313 ] ASF GitHub Bot logged work on HDDS-2053: Author: ASF GitHub Bot Created on: 09/Sep/19 22:15 Start Date: 09/Sep/19 22:15 Worklog Time Spent: 10m Work Description: hanishakoneru commented on issue #1373: HDDS-2053. Fix TestOzoneManagerRatisServer failure. Contributed by Xi… URL: https://github.com/apache/hadoop/pull/1373#issuecomment-529688386 Change LGTM. +1. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309313) Time Spent: 2h 40m (was: 2.5h) > Fix TestOzoneManagerRatisServer failure > --- > > Key: HDDS-2053 > URL: https://issues.apache.org/jira/browse/HDDS-2053 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > {{TestOzoneManagerRatisServer}} is failing on trunk with the following error > {noformat} > [ERROR] > verifyRaftGroupIdGenerationWithCustomOmServiceId(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer) > Time elapsed: 0.418 s <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > OzoneManagerDoubleBufferMetrics already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.ozone.om.ratis.metrics.OzoneManagerDoubleBufferMetrics.create(OzoneManagerDoubleBufferMetrics.java:50) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:110) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:88) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.(OzoneManagerStateMachine.java:87) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.getStateMachine(OzoneManagerRatisServer.java:314) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.(OzoneManagerRatisServer.java:244) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.newOMRatisServer(OzoneManagerRatisServer.java:302) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer.verifyRaftGroupIdGenerationWithCustomOmServiceId(TestOzoneManagerRatisServer.java:209) > ... > {noformat} > (Thanks [~nandakumar131] for the stack trace.) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica
[ https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926127#comment-16926127 ] Siyao Meng commented on HDFS-14283: --- [~leosun08] Any work done on your side yet? If not I can take over this one. [~jojochuang] I'm a bit worried that using enabling this by default could cause hot spot issue on those DataNodes with cached replicas. > DFSInputStream to prefer cached replica > --- > > Key: HDFS-14283 > URL: https://issues.apache.org/jira/browse/HDFS-14283 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 > Environment: HDFS Caching >Reporter: Wei-Chiu Chuang >Assignee: Lisheng Sun >Priority: Major > > HDFS Caching offers performance benefits. However, currently NameNode does > not treat cached replica with higher priority, so HDFS caching is only useful > when cache replication = 3, that is to say, all replicas are cached in > memory, so that a client doesn't randomly pick an uncached replica. > HDFS-6846 proposed to let NameNode give higher priority to cached replica. > Changing a logic in NameNode is always tricky so that didn't get much > traction. Here I propose a different approach: let client (DFSInputStream) > prefer cached replica. > A {{LocatedBlock}} object already contains cached replica location so a > client has the needed information. I think we can change > {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2105) Merge OzoneClientFactory#getRpcClient functions
Siyao Meng created HDDS-2105: Summary: Merge OzoneClientFactory#getRpcClient functions Key: HDDS-2105 URL: https://issues.apache.org/jira/browse/HDDS-2105 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Siyao Meng Assignee: Siyao Meng Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214 There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when HDDS-2007 is committed). They contains some redundant logic and unnecessarily increases code paths. Goal: Merge those functions into one or two. Work will begin after HDDS-2007 is committed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926109#comment-16926109 ] Wei-Chiu Chuang commented on HDFS-14509: [~ferhui] can you tell if this fix is still required after HDFS-13596? > DN throws InvalidToken due to inequality of password when upgrade NN 2.x to > 3.x > --- > > Key: HDFS-14509 > URL: https://issues.apache.org/jira/browse/HDFS-14509 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuxuan Wang >Priority: Blocker > Attachments: HDFS-14509-001.patch > > > According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need > upgrade NN first. And there will be a intermediate state that NN is 3.x and > DN is 2.x. At that moment, if a client reads (or writes) a block, it will get > a block token from NN and then deliver the token to DN who can verify the > token. But the verification in the code now is : > {code:title=BlockTokenSecretManager.java|borderStyle=solid} > public void checkAccess(...) > { > ... > id.readFields(new DataInputStream(new > ByteArrayInputStream(token.getIdentifier(; > ... > if (!Arrays.equals(retrievePassword(id), token.getPassword())) { > throw new InvalidToken("Block token with " + id.toString() > + " doesn't have the correct token password"); > } > } > {code} > And {{retrievePassword(id)}} is: > {code} > public byte[] retrievePassword(BlockTokenIdentifier identifier) > { > ... > return createPassword(identifier.getBytes(), key.getKey()); > } > {code} > So, if NN's identifier add new fields, DN will lose the fields and compute > wrong password. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=309262&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309262 ] ASF GitHub Bot logged work on HDDS-2075: Author: ASF GitHub Bot Created on: 09/Sep/19 21:25 Start Date: 09/Sep/19 21:25 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on issue #1415: HDDS-2075. Tracing in OzoneManager call is propagated with wrong parent URL: https://github.com/apache/hadoop/pull/1415#issuecomment-529673327 LGTM, +1. Thanks @adoroszlai for fixing this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309262) Time Spent: 40m (was: 0.5h) > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket-new.png, create_bucket.png > > Time Spent: 40m > Remaining Estimate: 0h > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional
[ https://issues.apache.org/jira/browse/HDDS-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2102: - Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) > HddsVolumeChecker should use java optional in place of Guava optional > - > > Key: HDDS-2102 > URL: https://issues.apache.org/jira/browse/HDDS-2102 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > HddsVolumeChecker should use java optional in place of Guava optional, as the > Guava dependency is marked unstable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2104) Refactor OMFailoverProxyProvider#loadOMClientConfigs
[ https://issues.apache.org/jira/browse/HDDS-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng updated HDDS-2104: - Description: Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321586979 Now that we decide to use client-side configuration for OM HA, some logic in OMFailoverProxyProvider#loadOMClientConfigs becomes redundant. The work will begin after HDDS-2007 is committed. was: Now that we decide to use client-side configuration for OM HA, some logic in OMFailoverProxyProvider#loadOMClientConfigs becomes redundant. The work will begin after HDDS-2007 is committed. > Refactor OMFailoverProxyProvider#loadOMClientConfigs > > > Key: HDDS-2104 > URL: https://issues.apache.org/jira/browse/HDDS-2104 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > > Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321586979 > Now that we decide to use client-side configuration for OM HA, some logic in > OMFailoverProxyProvider#loadOMClientConfigs becomes redundant. > The work will begin after HDDS-2007 is committed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional
[ https://issues.apache.org/jira/browse/HDDS-2102?focusedWorklogId=309252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309252 ] ASF GitHub Bot logged work on HDDS-2102: Author: ASF GitHub Bot Created on: 09/Sep/19 21:17 Start Date: 09/Sep/19 21:17 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on issue #1416: HDDS-2102. HddsVolumeChecker should use java optional in place of Guava optional. Contributed by Mukul Kumar Singh. URL: https://github.com/apache/hadoop/pull/1416#issuecomment-529670900 Thank You @mukul1987 for the contribution. I have committed this to the trunk. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309252) Time Spent: 40m (was: 0.5h) > HddsVolumeChecker should use java optional in place of Guava optional > - > > Key: HDDS-2102 > URL: https://issues.apache.org/jira/browse/HDDS-2102 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > HddsVolumeChecker should use java optional in place of Guava optional, as the > Guava dependency is marked unstable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2104) Refactor OMFailoverProxyProvider#loadOMClientConfigs
Siyao Meng created HDDS-2104: Summary: Refactor OMFailoverProxyProvider#loadOMClientConfigs Key: HDDS-2104 URL: https://issues.apache.org/jira/browse/HDDS-2104 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Siyao Meng Assignee: Siyao Meng Now that we decide to use client-side configuration for OM HA, some logic in OMFailoverProxyProvider#loadOMClientConfigs becomes redundant. The work will begin after HDDS-2007 is committed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional
[ https://issues.apache.org/jira/browse/HDDS-2102?focusedWorklogId=309251&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309251 ] ASF GitHub Bot logged work on HDDS-2102: Author: ASF GitHub Bot Created on: 09/Sep/19 21:17 Start Date: 09/Sep/19 21:17 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #1416: HDDS-2102. HddsVolumeChecker should use java optional in place of Guava optional. Contributed by Mukul Kumar Singh. URL: https://github.com/apache/hadoop/pull/1416 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309251) Time Spent: 0.5h (was: 20m) > HddsVolumeChecker should use java optional in place of Guava optional > - > > Key: HDDS-2102 > URL: https://issues.apache.org/jira/browse/HDDS-2102 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > HddsVolumeChecker should use java optional in place of Guava optional, as the > Guava dependency is marked unstable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1505) Remove "ozone.enabled" parameter from ozone configs
[ https://issues.apache.org/jira/browse/HDDS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vivek Ratnavel Subramanian reassigned HDDS-1505: Assignee: Vivek Ratnavel Subramanian > Remove "ozone.enabled" parameter from ozone configs > --- > > Key: HDDS-1505 > URL: https://issues.apache.org/jira/browse/HDDS-1505 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Minor > > Remove "ozone.enabled" config as it is no longer needed -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional
[ https://issues.apache.org/jira/browse/HDDS-2102?focusedWorklogId=309245&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309245 ] ASF GitHub Bot logged work on HDDS-2102: Author: ASF GitHub Bot Created on: 09/Sep/19 21:04 Start Date: 09/Sep/19 21:04 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1416: HDDS-2102. HddsVolumeChecker should use java optional in place of Guava optional. Contributed by Mukul Kumar Singh. URL: https://github.com/apache/hadoop/pull/1416#issuecomment-529666543 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 100 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 691 | trunk passed | | +1 | compile | 388 | trunk passed | | +1 | checkstyle | 74 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 979 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 185 | trunk passed | | 0 | spotbugs | 453 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 688 | trunk passed | ||| _ Patch Compile Tests _ | | +1 | mvninstall | 555 | the patch passed | | +1 | compile | 409 | the patch passed | | +1 | javac | 409 | the patch passed | | +1 | checkstyle | 80 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 725 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 183 | the patch passed | | +1 | findbugs | 768 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 313 | hadoop-hdds in the patch failed. | | -1 | unit | 289 | hadoop-ozone in the patch failed. | | +1 | asflicense | 42 | The patch does not generate ASF License warnings. | | | | 6642 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdds.scm.block.TestBlockManager | | | hadoop.ozone.om.ratis.TestOzoneManagerRatisServer | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.7 Server=18.09.7 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1416/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1416 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 626c82f6d9c2 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 469165e | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1416/1/artifact/out/patch-unit-hadoop-hdds.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1416/1/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1416/1/testReport/ | | Max. process+thread count | 1325 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/container-service U: hadoop-hdds/container-service | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1416/1/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309245) Time Spent: 20m (was: 10m) > HddsVolumeChecker should use java optional in place of Guava optional > - > > Key: HDDS-2102 > URL: https://issues.apache.org/jira/browse/HDDS-2102 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > HddsVolumeChecker
[jira] [Commented] (HDDS-2097) Add TeraSort to acceptance test
[ https://issues.apache.org/jira/browse/HDDS-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926094#comment-16926094 ] Xiaoyu Yao commented on HDDS-2097: -- Thanks [~ste...@apache.org] for the heads up. I will play with it and see if the existing one for s3a fits requirement on ozone. Also, ozone as a submodule depends on Hadoop 3.2.0, is this available in Hadoop 3.2.0? > Add TeraSort to acceptance test > --- > > Key: HDDS-2097 > URL: https://issues.apache.org/jira/browse/HDDS-2097 > Project: Hadoop Distributed Data Store > Issue Type: Test >Reporter: Xiaoyu Yao >Priority: Major > > We may begin with 1GB teragen/terasort/teravalidate. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2086) ReconServer throws SQLException but path present for ozone.recon.db.dir in ozone-site
[ https://issues.apache.org/jira/browse/HDDS-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan resolved HDDS-2086. - Resolution: Cannot Reproduce Unable to repro on latest trunk when configuring ozone.recon.db.dir to an existing directory. Possibly an environment issue. > ReconServer throws SQLException but path present for ozone.recon.db.dir in > ozone-site > - > > Key: HDDS-2086 > URL: https://issues.apache.org/jira/browse/HDDS-2086 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shweta >Priority: Major > > java.sql.SQLException: path to > '/${ozone.recon.db.dir}/ozone_recon_sqlite.db': '/${ozone.recon.db.dir}' does > not exist > But property present in ozone-site.xml: > > ozone.recon.db.dir > /tmp/metadata > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2103) TestContainerReplication fails due to unhealthy container
[ https://issues.apache.org/jira/browse/HDDS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2103 started by Doroszlai, Attila. --- > TestContainerReplication fails due to unhealthy container > - > > Key: HDDS-2103 > URL: https://issues.apache.org/jira/browse/HDDS-2103 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.5.0 >Reporter: Doroszlai, Attila >Assignee: Doroszlai, Attila >Priority: Major > > {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication.txt} > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.771 s <<< > FAILURE! - in org.apache.hadoop.ozone.container.TestContainerReplication > testContainerReplication(org.apache.hadoop.ozone.container.TestContainerReplication) > Time elapsed: 12.702 s <<< FAILURE! > java.lang.AssertionError: Container is not replicated to the destination > datanode > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertNotNull(Assert.java:621) > at > org.apache.hadoop.ozone.container.TestContainerReplication.testContainerReplication(TestContainerReplication.java:153) > {code} > caused by: > {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication-output.txt} > java.lang.IllegalStateException: Only closed containers could be exported: > ContainerId=1 > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(KeyValueContainer.java:525) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.exportContainer(KeyValueHandler.java:875) > at > org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.exportContainer(ContainerController.java:134) > at > org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource.java:64) > at > org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63) > {code} > Container is in unhealthy state because pipeline is not found for it in > {{CloseContainerCommandHandler}}. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2103) TestContainerReplication fails due to unhealthy container
[ https://issues.apache.org/jira/browse/HDDS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doroszlai, Attila updated HDDS-2103: Target Version/s: 0.5.0 > TestContainerReplication fails due to unhealthy container > - > > Key: HDDS-2103 > URL: https://issues.apache.org/jira/browse/HDDS-2103 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.5.0 >Reporter: Doroszlai, Attila >Assignee: Doroszlai, Attila >Priority: Major > > {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication.txt} > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.771 s <<< > FAILURE! - in org.apache.hadoop.ozone.container.TestContainerReplication > testContainerReplication(org.apache.hadoop.ozone.container.TestContainerReplication) > Time elapsed: 12.702 s <<< FAILURE! > java.lang.AssertionError: Container is not replicated to the destination > datanode > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertNotNull(Assert.java:621) > at > org.apache.hadoop.ozone.container.TestContainerReplication.testContainerReplication(TestContainerReplication.java:153) > {code} > caused by: > {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication-output.txt} > java.lang.IllegalStateException: Only closed containers could be exported: > ContainerId=1 > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(KeyValueContainer.java:525) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.exportContainer(KeyValueHandler.java:875) > at > org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.exportContainer(ContainerController.java:134) > at > org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource.java:64) > at > org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63) > {code} > Container is in unhealthy state because pipeline is not found for it in > {{CloseContainerCommandHandler}}. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2103) TestContainerReplication fails due to unhealthy container
Doroszlai, Attila created HDDS-2103: --- Summary: TestContainerReplication fails due to unhealthy container Key: HDDS-2103 URL: https://issues.apache.org/jira/browse/HDDS-2103 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Affects Versions: 0.5.0 Reporter: Doroszlai, Attila Assignee: Doroszlai, Attila {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication.txt} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.771 s <<< FAILURE! - in org.apache.hadoop.ozone.container.TestContainerReplication testContainerReplication(org.apache.hadoop.ozone.container.TestContainerReplication) Time elapsed: 12.702 s <<< FAILURE! java.lang.AssertionError: Container is not replicated to the destination datanode at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertNotNull(Assert.java:621) at org.apache.hadoop.ozone.container.TestContainerReplication.testContainerReplication(TestContainerReplication.java:153) {code} caused by: {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication-output.txt} java.lang.IllegalStateException: Only closed containers could be exported: ContainerId=1 at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(KeyValueContainer.java:525) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.exportContainer(KeyValueHandler.java:875) at org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.exportContainer(ContainerController.java:134) at org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource.java:64) at org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63) {code} Container is in unhealthy state because pipeline is not found for it in {{CloseContainerCommandHandler}}. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2098) Ozone shell command prints out ERROR when the log4j file is not present.
[ https://issues.apache.org/jira/browse/HDDS-2098?focusedWorklogId=309169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309169 ] ASF GitHub Bot logged work on HDDS-2098: Author: ASF GitHub Bot Created on: 09/Sep/19 19:55 Start Date: 09/Sep/19 19:55 Worklog Time Spent: 10m Work Description: avijayanhwx commented on issue #1411: HDDS-2098 : Ozone shell command prints out ERROR when the log4j file … URL: https://github.com/apache/hadoop/pull/1411#issuecomment-529639718 > I have a question > During ozone tarball build, we do copy ozone-shell-log4j.properties to etc/hadoop (like we copy log4.properties then why do we see this error or something need to be fixed in copying this script? > > https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/dist/dev-support/bin/dist-layout-stitching#L95 Yes, while starting ozone from snapshot tar ball, it works perfectly. However, when Ozone is deployed through a management product like Cloudera Manager, the log4j properties may not be individually configurable. We may have to rely on a default log4.properties. In that case, printing a FileNotFoundException for ozone shell commands is something we can avoid. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309169) Time Spent: 1h 20m (was: 1h 10m) > Ozone shell command prints out ERROR when the log4j file is not present. > > > Key: HDDS-2098 > URL: https://issues.apache.org/jira/browse/HDDS-2098 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone CLI >Affects Versions: 0.5.0 >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > *Exception Trace* > {code} > log4j:ERROR Could not read configuration file from URL > [file:/etc/ozone/conf/ozone-shell-log4j.properties]. > java.io.FileNotFoundException: /etc/ozone/conf/ozone-shell-log4j.properties > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.(FileInputStream.java:138) > at java.io.FileInputStream.(FileInputStream.java:93) > at > sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90) > at > sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188) > at > org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557) > at > org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) > at org.apache.log4j.LogManager.(LogManager.java:127) > at org.slf4j.impl.Log4jLoggerFactory.(Log4jLoggerFactory.java:66) > at org.slf4j.impl.StaticLoggerBinder.(StaticLoggerBinder.java:72) > at > org.slf4j.impl.StaticLoggerBinder.(StaticLoggerBinder.java:45) > at org.slf4j.LoggerFactory.bind(LoggerFactory.java:150) > at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:124) > at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:412) > at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:357) > at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:383) > at org.apache.hadoop.ozone.web.ozShell.Shell.(Shell.java:35) > log4j:ERROR Ignoring configuration file > [file:/etc/ozone/conf/ozone-shell-log4j.properties]. > log4j:WARN No appenders could be found for logger > (io.jaegertracing.thrift.internal.senders.ThriftSenderFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > { > "metadata" : { }, > "name" : "vol-test-putfile-1567740142", > "admin" : "root", > "owner" : "root", > "creationTime" : 1567740146501, > "acls" : [ { > "type" : "USER", > "name" : "root", > "aclScope" : "ACCESS", > "aclList" : [ "ALL" ] > }, { > "type" : "GROUP", > "name" : "root", > "aclScope" : "ACCESS", > "aclList" : [ "ALL" ] > } ], > "quota" : 1152921504606846976 > } > {code} > *Fix* > When a log4j file is not present, the default should be console. -- This message was sent by Atlassian Jira (v8.3.2#803003) ---
[jira] [Commented] (HDFS-14774) RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling
[ https://issues.apache.org/jira/browse/HDFS-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926050#comment-16926050 ] CR Hota commented on HDFS-14774: Hey [~jojochuang], Do you have any follow up questions or shall we close this? > RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling > - > > Key: HDFS-14774 > URL: https://issues.apache.org/jira/browse/HDFS-14774 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: CR Hota >Priority: Minor > > HDFS-13972 added the following code: > {code} > try { > dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE); > } catch (IOException e) { > LOG.error("Cannot get the datanodes from the RPC server", e); > } finally { > // Reset ugi to remote user for remaining operations. > RouterRpcServer.resetCurrentUser(); > } > HashSet excludes = new HashSet(); > if (excludeDatanodes != null) { > Collection collection = > getTrimmedStringCollection(excludeDatanodes); > for (DatanodeInfo dn : dns) { > if (collection.contains(dn.getName())) { > excludes.add(dn); > } > } > } > {code} > If {{rpcServer.getDatanodeReport()}} throws an exception, {{dns}} will become > null. This does't look like the best way to handle the exception. Should > router retry upon exception? Does it perform retry automatically under the > hood? > [~crh] [~brahmareddy] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1843) Undetectable corruption after restart of a datanode
[ https://issues.apache.org/jira/browse/HDDS-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926017#comment-16926017 ] Hudson commented on HDDS-1843: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17262 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17262/]) HDDS-1843. Undetectable corruption after restart of a datanode. (shashikant: rev 469165e6f29a6e7788f218bdbbc3f7bacf26628b) * (edit) hadoop-hdds/common/src/main/proto/DatanodeContainerProtocol.proto * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/ContainerDispatcher.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/BlockManagerImpl.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/server/TestSecureContainerServer.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/server/TestContainerServer.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/Container.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/TestCSMMetrics.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java > Undetectable corruption after restart of a datanode > --- > > Key: HDDS-1843 > URL: https://issues.apache.org/jira/browse/HDDS-1843 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-1843.000.patch > > Time Spent: 9h 50m > Remaining Estimate: 0h > > Right now, all write chunks use BufferedIO ie, sync flag is disabled by > default. Also, Rocks Db metadata updates are done in Rocks DB cache first at > Datanode. In case, there comes a situation where the buffered chunk data as > well as the corresponding metadata update is lost as a part of datanode > restart, it may lead to a situation where, it will not be possible to detect > the corruption (not even with container scanner) of this nature in a > reasonable time frame, until and unless there is a client IO failure or Recon > server detects it over time. In order to atleast to detect the problem, Ratis > snapshot on datanode should sync the rocks db file . In such a way, > ContainerScanner will be able to detect this.We can also add a metric around > sync to measure how much of a throughput loss it can incurr. > Thanks [~msingh] for suggesting this. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13541) NameNode Port based selective encryption
[ https://issues.apache.org/jira/browse/HDFS-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13541: -- Resolution: Fixed Status: Resolved (was: Patch Available) Although this is an umbrella Jira, given that this Jira is marked releaser blocker, closing this ticket to unblock releasers. > NameNode Port based selective encryption > > > Key: HDFS-13541 > URL: https://issues.apache.org/jira/browse/HDFS-13541 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, security >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Labels: release-blocker > Attachments: HDFS-13541-branch-2.001.patch, > HDFS-13541-branch-2.002.patch, HDFS-13541-branch-2.003.patch, > HDFS-13541-branch-3.1.001.patch, HDFS-13541-branch-3.1.002.patch, > HDFS-13541-branch-3.2.001.patch, HDFS-13541-branch-3.2.002.patch, NameNode > Port based selective encryption-v1.pdf > > > Here at LinkedIn, one issue we face is that we need to enforce different > security requirement based on the location of client and the cluster. > Specifically, for clients from outside of the data center, it is required by > regulation that all traffic must be encrypted. But for clients within the > same data center, unencrypted connections are more desired to avoid the high > encryption overhead. > HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335 > introduced WhitelistBasedResolver which solves the same problem. However we > found it difficult to fit into our environment for several reasons. In this > JIRA, on top of pluggable SASL resolver, *we propose a different approach of > running RPC two ports on NameNode, and the two ports will be enforcing > encrypted and unencrypted connections respectively, and the following > DataNode access will simply follow the same behaviour of > encryption/unencryption*. Then by blocking unencrypted port on datacenter > firewall, we can completely block unencrypted external access. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=309142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309142 ] ASF GitHub Bot logged work on HDDS-2075: Author: ASF GitHub Bot Created on: 09/Sep/19 18:25 Start Date: 09/Sep/19 18:25 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1415: HDDS-2075. Tracing in OzoneManager call is propagated with wrong parent URL: https://github.com/apache/hadoop/pull/1415#issuecomment-529606399 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 1333 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 45 | Maven dependency ordering for branch | | +1 | mvninstall | 647 | trunk passed | | +1 | compile | 391 | trunk passed | | +1 | checkstyle | 75 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 947 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 172 | trunk passed | | 0 | spotbugs | 479 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 702 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 23 | Maven dependency ordering for patch | | +1 | mvninstall | 562 | the patch passed | | +1 | compile | 374 | the patch passed | | +1 | javac | 374 | the patch passed | | +1 | checkstyle | 79 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 744 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 167 | the patch passed | | +1 | findbugs | 653 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 315 | hadoop-hdds in the patch passed. | | -1 | unit | 236 | hadoop-ozone in the patch failed. | | +1 | asflicense | 40 | The patch does not generate ASF License warnings. | | | | 7674 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.om.ratis.TestOzoneManagerRatisServer | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.0 Server=19.03.0 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1415/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1415 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux fe5b73ebf793 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 147f986 | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1415/1/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1415/1/testReport/ | | Max. process+thread count | 1298 (vs. ulimit of 5500) | | modules | C: hadoop-ozone/common hadoop-ozone/client U: hadoop-ozone | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1415/1/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309142) Time Spent: 0.5h (was: 20m) > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket-new.png, create_bucket.png > > Time Spent: 0.5h > Remaining Estimate: 0h
[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation
[ https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925976#comment-16925976 ] Lukas Majercak commented on HDFS-12288: --- [~zhangchen] not working on this right now, feel free to pick it up. > Fix DataNode's xceiver count calculation > > > Key: HDFS-12288 > URL: https://issues.apache.org/jira/browse/HDFS-12288 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch > > > The problem with the ThreadGroup.activeCount() method is that the method is > only a very rough estimate, and in reality returns the total number of > threads in the thread group as opposed to the threads actually running. > In some DNs, we saw this to return 50~ for a long time, even though the > actual number of DataXceiver threads was next to none. > This is a big issue as we use the xceiverCount to make decisions on the NN > for choosing replication source DN or returning DNs to clients for R/W. > The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value > which only accounts for actual number of DataXcevier threads currently > running and thus represents the load on the DN much better. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2076) Read fails because the block cannot be located in the container
[ https://issues.apache.org/jira/browse/HDDS-2076?focusedWorklogId=309112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309112 ] ASF GitHub Bot logged work on HDDS-2076: Author: ASF GitHub Bot Created on: 09/Sep/19 17:38 Start Date: 09/Sep/19 17:38 Worklog Time Spent: 10m Work Description: nandakumar131 commented on issue #1410: HDDS-2076. Read fails because the block cannot be located in the container URL: https://github.com/apache/hadoop/pull/1410#issuecomment-529588126 /retest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309112) Time Spent: 1h 40m (was: 1.5h) > Read fails because the block cannot be located in the container > --- > > Key: HDDS-2076 > URL: https://issues.apache.org/jira/browse/HDDS-2076 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client, Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: MiniOzoneChaosCluster, pull-request-available > Attachments: log.zip > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Read fails as the client is not able to read the block from the container. > {code} > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > Unable to find the block with bcsID 2515 .Container 7 bcsId is 0. > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambd2a0$getValid1a9to-08-30 > 12:51:20,081 | INFO | SCMAudit | user=msingh | ip=192.168.0.r103 > |List$0(ContainerP > rotocolCalls.java:569) > {code} > The client eventually exits here > {code} > 2019-08-30 12:51:20,081 [pool-224-thread-6] ERROR > ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:readData(176)) - > LOADGEN: Read key:pool-224-thread-6_330651 failed with ex > ception > ERROR ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:load(121)) - > LOADGEN: Exiting due to exception > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
[ https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925944#comment-16925944 ] Íñigo Goiri commented on HDFS-14820: What is the current default value? 8KB? I think this is too sensitive to change like this. We should make it configurable and make the default the old value. > The default 8KB buffer of > BlockReaderRemote#newBlockReader#BufferedOutputStream is too big > --- > > Key: HDFS-14820 > URL: https://issues.apache.org/jira/browse/HDFS-14820 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14820.001.patch > > > this issue is similar to HDFS-14535. > {code:java} > public static BlockReader newBlockReader(String file, > ExtendedBlock block, > Token blockToken, > long startOffset, long len, > boolean verifyChecksum, > String clientName, > Peer peer, DatanodeID datanodeID, > PeerCache peerCache, > CachingStrategy cachingStrategy, > int networkDistance) throws IOException { > // in and out will be closed when sock is closed (by the caller) > final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( > peer.getOutputStream())); > new Sender(out).readBlock(block, blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy); > } > public BufferedOutputStream(OutputStream out) { > this(out, 8192); > } > {code} > Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy) could not use such a big buffer. > So i think it should reduce BufferedOutputStream buffer. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14833) RBF: Router Update Doesn't Sync Quota
[ https://issues.apache.org/jira/browse/HDFS-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14833: Parent: HDFS-14603 Issue Type: Sub-task (was: Bug) > RBF: Router Update Doesn't Sync Quota > - > > Key: HDFS-14833 > URL: https://issues.apache.org/jira/browse/HDFS-14833 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > > HDFS-14777 Added a check to prevent RPC call, It checks whether in the > present state whether quota is changing. > But ignores the part that if the locations are changed. if the location is > changed the new destination should be synchronized with the mount entry > quota. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14704) RBF: ServiceAddress and webAddress should not be null in NamenodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925942#comment-16925942 ] Íñigo Goiri commented on HDFS-14704: Keep in mind there are setup were the serviceAddress is not used; do we support those with this change? > RBF: ServiceAddress and webAddress should not be null in > NamenodeHeartbeatService > - > > Key: HDFS-14704 > URL: https://issues.apache.org/jira/browse/HDFS-14704 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14704-trunk-001.patch, HDFS-14704-trunk-002.patch, > HDFS-14704-trunk-003.patch > > > NnId should not be null in NamenodeHeartbeatService. > If NnId is null, it will also print the error message like: > {code:java} > 2019-08-06 10:38:07,455 ERROR router.NamenodeHeartbeatService > (NamenodeHeartbeatService.java:updateState(229)) - Unhandled exception > updating NN registration for ns1:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:267) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional
[ https://issues.apache.org/jira/browse/HDDS-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh reassigned HDDS-2102: --- Assignee: Mukul Kumar Singh > HddsVolumeChecker should use java optional in place of Guava optional > - > > Key: HDDS-2102 > URL: https://issues.apache.org/jira/browse/HDDS-2102 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HddsVolumeChecker should use java optional in place of Guava optional, as the > Guava dependency is marked unstable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional
[ https://issues.apache.org/jira/browse/HDDS-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-2102: Status: Patch Available (was: Open) > HddsVolumeChecker should use java optional in place of Guava optional > - > > Key: HDDS-2102 > URL: https://issues.apache.org/jira/browse/HDDS-2102 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HddsVolumeChecker should use java optional in place of Guava optional, as the > Guava dependency is marked unstable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional
[ https://issues.apache.org/jira/browse/HDDS-2102?focusedWorklogId=309085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309085 ] ASF GitHub Bot logged work on HDDS-2102: Author: ASF GitHub Bot Created on: 09/Sep/19 17:18 Start Date: 09/Sep/19 17:18 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #1416: HDDS-2102. HddsVolumeChecker should use java optional in place of Guava optional. Contributed by Mukul Kumar Singh. URL: https://github.com/apache/hadoop/pull/1416 HddsVolumeChecker should use java optional in place of Guava optional as Guava Optional is marked as unstable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309085) Remaining Estimate: 0h Time Spent: 10m > HddsVolumeChecker should use java optional in place of Guava optional > - > > Key: HDDS-2102 > URL: https://issues.apache.org/jira/browse/HDDS-2102 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HddsVolumeChecker should use java optional in place of Guava optional, as the > Guava dependency is marked unstable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional
[ https://issues.apache.org/jira/browse/HDDS-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2102: - Labels: pull-request-available (was: ) > HddsVolumeChecker should use java optional in place of Guava optional > - > > Key: HDDS-2102 > URL: https://issues.apache.org/jira/browse/HDDS-2102 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > > HddsVolumeChecker should use java optional in place of Guava optional, as the > Guava dependency is marked unstable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1843) Undetectable corruption after restart of a datanode
[ https://issues.apache.org/jira/browse/HDDS-1843?focusedWorklogId=309083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309083 ] ASF GitHub Bot logged work on HDDS-1843: Author: ASF GitHub Bot Created on: 09/Sep/19 17:16 Start Date: 09/Sep/19 17:16 Worklog Time Spent: 10m Work Description: bshashikant commented on pull request #1364: HDDS-1843. Undetectable corruption after restart of a datanode. URL: https://github.com/apache/hadoop/pull/1364 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309083) Time Spent: 9h 50m (was: 9h 40m) > Undetectable corruption after restart of a datanode > --- > > Key: HDDS-1843 > URL: https://issues.apache.org/jira/browse/HDDS-1843 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-1843.000.patch > > Time Spent: 9h 50m > Remaining Estimate: 0h > > Right now, all write chunks use BufferedIO ie, sync flag is disabled by > default. Also, Rocks Db metadata updates are done in Rocks DB cache first at > Datanode. In case, there comes a situation where the buffered chunk data as > well as the corresponding metadata update is lost as a part of datanode > restart, it may lead to a situation where, it will not be possible to detect > the corruption (not even with container scanner) of this nature in a > reasonable time frame, until and unless there is a client IO failure or Recon > server detects it over time. In order to atleast to detect the problem, Ratis > snapshot on datanode should sync the rocks db file . In such a way, > ContainerScanner will be able to detect this.We can also add a metric around > sync to measure how much of a throughput loss it can incurr. > Thanks [~msingh] for suggesting this. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1843) Undetectable corruption after restart of a datanode
[ https://issues.apache.org/jira/browse/HDDS-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1843: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Undetectable corruption after restart of a datanode > --- > > Key: HDDS-1843 > URL: https://issues.apache.org/jira/browse/HDDS-1843 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-1843.000.patch > > Time Spent: 9h 50m > Remaining Estimate: 0h > > Right now, all write chunks use BufferedIO ie, sync flag is disabled by > default. Also, Rocks Db metadata updates are done in Rocks DB cache first at > Datanode. In case, there comes a situation where the buffered chunk data as > well as the corresponding metadata update is lost as a part of datanode > restart, it may lead to a situation where, it will not be possible to detect > the corruption (not even with container scanner) of this nature in a > reasonable time frame, until and unless there is a client IO failure or Recon > server detects it over time. In order to atleast to detect the problem, Ratis > snapshot on datanode should sync the rocks db file . In such a way, > ContainerScanner will be able to detect this.We can also add a metric around > sync to measure how much of a throughput loss it can incurr. > Thanks [~msingh] for suggesting this. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1843) Undetectable corruption after restart of a datanode
[ https://issues.apache.org/jira/browse/HDDS-1843?focusedWorklogId=309082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309082 ] ASF GitHub Bot logged work on HDDS-1843: Author: ASF GitHub Bot Created on: 09/Sep/19 17:15 Start Date: 09/Sep/19 17:15 Worklog Time Spent: 10m Work Description: bshashikant commented on issue #1364: HDDS-1843. Undetectable corruption after restart of a datanode. URL: https://github.com/apache/hadoop/pull/1364#issuecomment-529579125 Thanks @nandakumar131 @mukul1987 @supratimdeka for the reviews. I have committed this change to trunk. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309082) Time Spent: 9h 40m (was: 9.5h) > Undetectable corruption after restart of a datanode > --- > > Key: HDDS-1843 > URL: https://issues.apache.org/jira/browse/HDDS-1843 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-1843.000.patch > > Time Spent: 9h 40m > Remaining Estimate: 0h > > Right now, all write chunks use BufferedIO ie, sync flag is disabled by > default. Also, Rocks Db metadata updates are done in Rocks DB cache first at > Datanode. In case, there comes a situation where the buffered chunk data as > well as the corresponding metadata update is lost as a part of datanode > restart, it may lead to a situation where, it will not be possible to detect > the corruption (not even with container scanner) of this nature in a > reasonable time frame, until and unless there is a client IO failure or Recon > server detects it over time. In order to atleast to detect the problem, Ratis > snapshot on datanode should sync the rocks db file . In such a way, > ContainerScanner will be able to detect this.We can also add a metric around > sync to measure how much of a throughput loss it can incurr. > Thanks [~msingh] for suggesting this. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925927#comment-16925927 ] Íñigo Goiri commented on HDFS-14795: Thanks [~leosun08], I think this is more readable now. For "isWrite()" I would use ifs instead of switch: {code} if (stage == PIPELINE_SETUP_STREAMING_RECOVERY) { return true; } else if (stage = =PIPELINE_SETUP_APPEND_RECOVERY) { return true; } else { return false; } {code} A minor neat, the indentation DFSConfigKeys#123 doesn't seem consistent. > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional
Mukul Kumar Singh created HDDS-2102: --- Summary: HddsVolumeChecker should use java optional in place of Guava optional Key: HDDS-2102 URL: https://issues.apache.org/jira/browse/HDDS-2102 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.4.0 Reporter: Mukul Kumar Singh HddsVolumeChecker should use java optional in place of Guava optional, as the Guava dependency is marked unstable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2101) Ozone filesystem provider doesn't exist
[ https://issues.apache.org/jira/browse/HDDS-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925893#comment-16925893 ] Elek, Marton commented on HDDS-2101: The problem is that the exact implementation depends from the current environment. In case of a legacy hadoop it should be BasicOzoneFileSystem for hadoop 3.2 it should be OzoneFileSystem... > Ozone filesystem provider doesn't exist > --- > > Key: HDDS-2101 > URL: https://issues.apache.org/jira/browse/HDDS-2101 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Jitendra Nath Pandey >Assignee: Vivek Ratnavel Subramanian >Priority: Critical > > We don't have a filesystem provider in META-INF. > i.e. following file doesn't exist. > {{hadoop-ozone/ozonefs/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}} > See for example > {{hadoop-tools/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doroszlai, Attila updated HDDS-2075: Status: Patch Available (was: In Progress) > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket-new.png, create_bucket.png > > Time Spent: 20m > Remaining Estimate: 0h > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=309017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309017 ] ASF GitHub Bot logged work on HDDS-2075: Author: ASF GitHub Bot Created on: 09/Sep/19 16:17 Start Date: 09/Sep/19 16:17 Worklog Time Spent: 10m Work Description: adoroszlai commented on issue #1415: HDDS-2075. Tracing in OzoneManager call is propagated with wrong parent URL: https://github.com/apache/hadoop/pull/1415#issuecomment-529555336 /label ozone This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309017) Time Spent: 20m (was: 10m) > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket-new.png, create_bucket.png > > Time Spent: 20m > Remaining Estimate: 0h > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doroszlai, Attila updated HDDS-2075: Attachment: create_bucket-new.png > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket-new.png, create_bucket.png > > Time Spent: 20m > Remaining Estimate: 0h > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=309015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309015 ] ASF GitHub Bot logged work on HDDS-2075: Author: ASF GitHub Bot Created on: 09/Sep/19 16:16 Start Date: 09/Sep/19 16:16 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #1415: HDDS-2075. Tracing in OzoneManager call is propagated with wrong parent URL: https://github.com/apache/hadoop/pull/1415 ## What changes were proposed in this pull request? Apply tracing to `OzoneManagerProtocol` instead of `OzoneManagerProtocolPB`. The latter only has a single public method, and no other `*ProtocolPB` interface is traced. https://issues.apache.org/jira/browse/HDDS-2075 ## How was this patch tested? Verified operation hierarchy in Jaeger UI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 309015) Remaining Estimate: 0h Time Spent: 10m > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket.png > > Time Spent: 10m > Remaining Estimate: 0h > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2075: - Labels: pull-request-available (was: ) > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket.png > > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doroszlai, Attila updated HDDS-2075: Attachment: create_bucket.png > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Attachments: create_bucket.png > > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doroszlai, Attila updated HDDS-2075: Target Version/s: 0.5.0 > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2075 started by Doroszlai, Attila. --- > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doroszlai, Attila reassigned HDDS-2075: --- Assignee: Doroszlai, Attila > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925848#comment-16925848 ] Erik Krogen commented on HDFS-14655: Looks better to me, thanks [~ayushtkn]! I do think we should rename the config and update the description to represent that this config is a _maximum_ thread count; the way it reads now, I would assume that there are always this many threads being used. One thing I noticed, you used a keepalive time of 0: {code} return new HadoopThreadPoolExecutor(1, numThreads, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(), {code} I feel a longer time would probably be better; if more than 1 thread is needed, it will probably be needed again soon (might represent a slow JN?), so it seems some keepalive would be helpful to avoid the thread creation overhead. Also you can use [diamond-typing|https://docs.oracle.com/javase/tutorial/java/generics/types.html#diamond] here for the {{LinkedBlockingQueue}} instantiation. > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925848#comment-16925848 ] Erik Krogen edited comment on HDFS-14655 at 9/9/19 3:43 PM: Looks better to me, thanks [~ayushtkn]! I do think we should rename the config and update the description to represent that this config is a _maximum_ thread count; the way it reads now, I would assume that there are always this many threads being used. One thing I noticed, you used a keepalive time of 0: {code} return new HadoopThreadPoolExecutor(1, numThreads, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(), {code} I feel a longer time would probably be better; if more than 1 thread is needed, it will probably be needed again soon (might represent a slow JN?), so it seems some keepalive would be helpful to avoid the thread creation overhead. Also you can use [diamond-typing|https://docs.oracle.com/javase/tutorial/java/generics/types.html#diamond] here for the {{LinkedBlockingQueue}} instantiation. [~shv], does the current approach address your previous concerns? was (Author: xkrogen): Looks better to me, thanks [~ayushtkn]! I do think we should rename the config and update the description to represent that this config is a _maximum_ thread count; the way it reads now, I would assume that there are always this many threads being used. One thing I noticed, you used a keepalive time of 0: {code} return new HadoopThreadPoolExecutor(1, numThreads, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(), {code} I feel a longer time would probably be better; if more than 1 thread is needed, it will probably be needed again soon (might represent a slow JN?), so it seems some keepalive would be helpful to avoid the thread creation overhead. Also you can use [diamond-typing|https://docs.oracle.com/javase/tutorial/java/generics/types.html#diamond] here for the {{LinkedBlockingQueue}} instantiation. > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.secu
[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold
[ https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925775#comment-16925775 ] Zhao Yi Ming commented on HDFS-14699: - [~surendrasingh] Good Point! You are right! We only need the srcNodes under the replicationStreamsHardLimit control, liveBlockIndices just used for the reconstruction work, it can move before the threshold check. I will do the changes and test in our testing env, if everything goes well(The testing need some time, hope I can occupy the testing env as soon as possible). I will submit a new patch with your comments. Thanks again! > Erasure Coding: Storage not considered in live replica when replication > streams hard limit reached to threshold > --- > > Key: HDFS-14699 > URL: https://issues.apache.org/jira/browse/HDFS-14699 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.2.0, 3.1.1, 3.3.0 >Reporter: Zhao Yi Ming >Assignee: Zhao Yi Ming >Priority: Critical > Labels: patch > Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, > HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, > HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, > image-2019-09-02-17-51-46-742.png > > > We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the > same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. > Following are our testing steps, hope it can helpful.(following DNs have the > testing internal blocks) > # we customized a new 10-2-1024k policy and use it on a path, now we have 12 > internal block(12 live block) > # decommission one DN, after the decommission complete. now we have 13 > internal block(12 live block and 1 decommission block) > # then shutdown one DN which did not have the same block id as 1 > decommission block, now we have 12 internal block(11 live block and 1 > decommission block) > # after wait for about 600s (before the heart beat come) commission the > decommissioned DN again, now we have 12 internal block(11 live block and 1 > duplicate block) > # Then the EC is not reconstruct the missed block > We think this is a critical issue for using the EC function in a production > env. Could you help? Thanks a lot! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2076) Read fails because the block cannot be located in the container
[ https://issues.apache.org/jira/browse/HDDS-2076?focusedWorklogId=308962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308962 ] ASF GitHub Bot logged work on HDDS-2076: Author: ASF GitHub Bot Created on: 09/Sep/19 15:04 Start Date: 09/Sep/19 15:04 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1410: HDDS-2076. Read fails because the block cannot be located in the container URL: https://github.com/apache/hadoop/pull/1410#issuecomment-529520412 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 41 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 25 | Maven dependency ordering for branch | | +1 | mvninstall | 581 | trunk passed | | +1 | compile | 383 | trunk passed | | +1 | checkstyle | 81 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 881 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 178 | trunk passed | | 0 | spotbugs | 418 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 612 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 38 | Maven dependency ordering for patch | | +1 | mvninstall | 544 | the patch passed | | +1 | compile | 394 | the patch passed | | +1 | javac | 394 | the patch passed | | +1 | checkstyle | 88 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 713 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 177 | the patch passed | | +1 | findbugs | 704 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 196 | hadoop-hdds in the patch failed. | | -1 | unit | 195 | hadoop-ozone in the patch failed. | | +1 | asflicense | 46 | The patch does not generate ASF License warnings. | | | | 6058 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.container.keyvalue.TestKeyValueContainer | | | hadoop.ozone.container.ozoneimpl.TestOzoneContainer | | | hadoop.ozone.om.ratis.TestOzoneManagerRatisServer | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1410 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux cdb643d21b64 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 60af879 | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/3/artifact/out/patch-unit-hadoop-hdds.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/3/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/3/testReport/ | | Max. process+thread count | 1263 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/container-service hadoop-ozone/integration-test U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/3/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 308962) Time Spent: 1.5h (was: 1h 20m) > Read fails because the block cannot be located in the container > --- > > Key: HDDS-2076 > URL: https://issues.apache.org/jira/browse/HDDS-2076 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client, Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee:
[jira] [Commented] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log
[ https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925758#comment-16925758 ] Hadoop QA commented on HDFS-14303: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 23s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 18s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} branch-3.2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 35s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}195m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.server.diskbalancer.TestDiskBalancer | | | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.namenode.ha.TestHAAppend | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:63396beab41 | | JIRA Issue | HDFS-14303 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979846/HDFS-14303-branch-3.2.addendum.03.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7b28d52a7e2e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.2 / f6cc887 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/27824/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27824/testReport/ | | Max. process+thread count | 2848 (vs. u
[jira] [Assigned] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric
[ https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang reassigned HDFS-14836: -- Assignee: Aiphago > FileIoProvider should not increase FileIoErrors metric in datanode volume > metric > > > Key: HDFS-14836 > URL: https://issues.apache.org/jira/browse/HDFS-14836 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Aiphago >Assignee: Aiphago >Priority: Minor > > I found that FileIoErrors metric will increase in > BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But > in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been > ignore like "Broken pipe" and "Connection reset" . > So should do a filter when fileIoProvider increase FileIoErrors count ? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2082) Fix flaky TestContainerStateMachineFailures#testApplyTransactionFailure
[ https://issues.apache.org/jira/browse/HDDS-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925690#comment-16925690 ] Doroszlai, Attila commented on HDDS-2082: - [~shashikant], more often {{TestContainerStateMachineFailures#testApplyTransactionFailure}} fails (with error) due to [exception type mismatch in response to the close container request|https://github.com/apache/hadoop/blob/60af8793b45b4057101a22e4248d7ca022b52d79/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java#L328-L334]. The root cause of the {{IOException}} is a {{StateMachineException}}, which is not expected by {{checkForException}}, thus the {{IOException}} is re-thrown. {code} StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Error while creating/ updating .container file. ContainerID: 5 {code} https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-zfkm8/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-1569-5th2c/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-2060-hng4s/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-1094-hnp8f/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-2002-fbg9h/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-1094-85qxc/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-1571-bx9p4/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-2064-v25ns/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt > Fix flaky TestContainerStateMachineFailures#testApplyTransactionFailure > --- > > Key: HDDS-2082 > URL: https://issues.apache.org/jira/browse/HDDS-2082 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Dinesh Chitlangia >Priority: Major > > {code:java} > --- > Test set: org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures > --- > Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 102.615 s <<< > FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures > testApplyTransactionFailure(org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures) > Time elapsed: 15.677 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.testApplyTransactionFailure(TestContainerStateMachineFailures.java:349) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.
[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric
[ https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925670#comment-16925670 ] Wei-Chiu Chuang commented on HDFS-14836: I don't think I understand the proposal here. Upon Exception in BlockSender.sendPacket(), FileIoErrors is incremented. But you don't want "Broken pipe" and "Connection reset" to increment FileIoErrors, am I right? Is that because those exceptions are network issues not local disk issue? > FileIoProvider should not increase FileIoErrors metric in datanode volume > metric > > > Key: HDFS-14836 > URL: https://issues.apache.org/jira/browse/HDFS-14836 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Aiphago >Priority: Minor > > I found that FileIoErrors metric will increase in > BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But > in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been > ignore like "Broken pipe" and "Connection reset" . > So should do a filter when fileIoProvider increase FileIoErrors count ? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric
[ https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925661#comment-16925661 ] He Xiaoqiao commented on HDFS-14836: Thanks [~Aiphag0] for your report, and I agree that we should filter out count FileIoError when meet some explicit exception as HDFS-2054 said, otherwise this counter will be polluted and no valuable to reference. [~jojochuang] [~ayushtkn] any thought? could you help to add [~Aiphag0] as contributor and assign this JIRA to him? > FileIoProvider should not increase FileIoErrors metric in datanode volume > metric > > > Key: HDFS-14836 > URL: https://issues.apache.org/jira/browse/HDFS-14836 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Aiphago >Priority: Minor > > I found that FileIoErrors metric will increase in > BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But > in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been > ignore like "Broken pipe" and "Connection reset" . > So should do a filter when fileIoProvider increase FileIoErrors count ? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric
[ https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aiphago updated HDFS-14836: --- Summary: FileIoProvider should not increase FileIoErrors metric in datanode volume metric (was: FileIoProvider will increase FileIoErrors metric in datanode volume metric) > FileIoProvider should not increase FileIoErrors metric in datanode volume > metric > > > Key: HDFS-14836 > URL: https://issues.apache.org/jira/browse/HDFS-14836 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Aiphago >Priority: Minor > > I found that FileIoErrors metric will increase in > BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But > in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been > ignore like "Broken pipe" and "Connection reset" . > So should do a filter when fileIoProvider increase FileIoErrors count ? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925642#comment-16925642 ] Hadoop QA commented on HDFS-14378: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-14378 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14378 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12966476/HDFS-14378-trunk.006.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27825/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Major > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, > HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org