[jira] [Comment Edited] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927286#comment-16927286 ] Chen Zhang edited comment on HDFS-14609 at 9/11/19 6:47 AM: Hi [~crh], thanks for your response, I've some explaination in this [comment |https://issues.apache.org/jira/browse/HDFS-14609?focusedCommentId=16907486&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16907486]about the change of test's behavior, FYI. For your question, the answer is : {quote} # For case {{testGetDelegationToken()}}, the server address is set by WebHdfsFileSystem after it get the response, the original address is the address of RouterRpcServer. Since we now send request by http connection directly, it's unnecessary to reset the address, so I removed this assert # For the case {{testCancelDelegationToken()}}, the {{InvalidToken}} exception is also generated by WebHdfsFileSystem and the logic is very complex, I think it's also unnecessary to keep this assert, I use the 403 detection instead.{quote} was (Author: zhangchen): Hi [~crh], thanks for your response, I've some explaination in this comment about the change of test's behavior, FYI. For your question, the answer is : {quote} # For case {{testGetDelegationToken()}}, the server address is set by WebHdfsFileSystem after it get the response, the original address is the address of RouterRpcServer. Since we now send request by http connection directly, it's unnecessary to reset the address, so I removed this assert # For the case {{testCancelDelegationToken()}}, the {{InvalidToken}} exception is also generated by WebHdfsFileSystem and the logic is very complex, I think it's also unnecessary to keep this assert, I use the 403 detection instead.{quote} > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, > HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
[ https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926355#comment-16926355 ] wangcong edited comment on HDFS-10943 at 9/11/19 6:43 AM: -- [~daryn],[~kihwal],[~zhz],[~hexiaoqiao],[~yzhangal] Our log cluster occurs this problem several times. We use individual cluster to write yarn logs. But this log cluster crash serveral times.In the process of viewing logs,We found the same error as this issue. To diagnosis this problem ,we deploy HDFS-11306 and HDFS-11292. When log cluster crash again , diagnostic log as follows: 2019-09-10 03:50:16,403 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: LastWrittenTxId 5061382841 is expected to be the same as LastSyncedTxId 5061382840 2019-09-10 03:50:16,403 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: The edits buffer is 85 bytes long with 1 unflushed transactions.Below is the list of unflushed transactions 2019-09-10 03:50:16,408 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: Unflushed op [0]: CancelDelegationTokenOp [token=token for yarn: HDFS_DELEGATION_TOKEN owner=yarn/datanod...@domain.com, renewer=yarn, realUser=,issueDate=1567970236988, maxDate=1568575036988, sequenceNumber=621170591, masterKeyId=108, opcode=OP_CANCEL_DELEGATION_TOKEN, txid=5061382841] 2019-09-10 03:50:16,409 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log segment 5060982535, 5061382841 failed for required journal (JournalAndStream(mgr=QJM to [10.0.0.1:8001,10.0.0.2:8001,10.0.0.3:8001],stream=QuorumOutputStream starting at txid 5060982535)) java.io.IOException: FSEditsStream has 85 bytes still to be flushed and cannot be closed After deploying patch,namenode crash occurs twice. The op which cause this problem all is CancelDelegationTokenOp. was (Author: swingcong): [~daryn],[~kihwal],[~zhz],[~hexiaoqiao],[~yzhangal] Our log cluster occurs this problem several times. We use individual cluster to write to yarn logs. But this log cluster crash serveral times.In the process of viewing logs,We found the same error as this issue. To diagnosis this problem ,we deploy HDFS-11306 and HDFS-11292. When log cluster crash again , diagnostic log as follows: 2019-09-10 03:50:16,403 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: LastWrittenTxId 5061382841 is expected to be the same as LastSyncedTxId 5061382840 2019-09-10 03:50:16,403 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: The edits buffer is 85 bytes long with 1 unflushed transactions.Below is the list of unflushed transactions 2019-09-10 03:50:16,408 WARN org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer: Unflushed op [0]: CancelDelegationTokenOp [token=token for yarn: HDFS_DELEGATION_TOKEN owner=yarn/datanod...@domain.com, renewer=yarn, realUser=,issueDate=1567970236988, maxDate=1568575036988, sequenceNumber=621170591, masterKeyId=108, opcode=OP_CANCEL_DELEGATION_TOKEN, txid=5061382841] 2019-09-10 03:50:16,409 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log segment 5060982535, 5061382841 failed for required journal (JournalAndStream(mgr=QJM to [10.0.0.1:8001,10.0.0.2:8001,10.0.0.3:8001],stream=QuorumOutputStream starting at txid 5060982535)) java.io.IOException: FSEditsStream has 85 bytes still to be flushed and cannot be closed After deploying patch,namenode crash occurs twice. The op which cause this problem all is CancelDelegationTokenOp. > rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed > -- > > Key: HDFS-10943 > URL: https://issues.apache.org/jira/browse/HDFS-10943 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Major > > Per the following trace stack: > {code} > FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 10562075963, 10562174157 failed for required journal > (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, > 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid > 10562075963)) > java.io.IOException: FSEditStream has 49708 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) >
[jira] [Commented] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927290#comment-16927290 ] Hadoop QA commented on HDFS-14795: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 18s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 53s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}152m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | | hadoop.hdfs.TestDecommission | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14795 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980015/HDFS-14795.009.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux e0ddb3e5694c 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dacc448 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/27840/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreC
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927286#comment-16927286 ] Chen Zhang commented on HDFS-14609: --- Hi [~crh], thanks for your response, I've some explaination in this comment about the change of test's behavior, FYI. For your question, the answer is : {quote} # For case {{testGetDelegationToken()}}, the server address is set by WebHdfsFileSystem after it get the response, the original address is the address of RouterRpcServer. Since we now send request by http connection directly, it's unnecessary to reset the address, so I removed this assert # For the case {{testCancelDelegationToken()}}, the {{InvalidToken}} exception is also generated by WebHdfsFileSystem and the logic is very complex, I think it's also unnecessary to keep this assert, I use the 403 detection instead.{quote} > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, > HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids
[ https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310336&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310336 ] ASF GitHub Bot logged work on HDDS-2007: Author: ASF GitHub Bot Created on: 11/Sep/19 06:30 Start Date: 11/Sep/19 06:30 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1360: HDDS-2007. Make ozone fs shell command work with OM HA service ids URL: https://github.com/apache/hadoop/pull/1360#issuecomment-530241185 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 94 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | 0 | shelldocs | 0 | Shelldocs was not available. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 5 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 40 | Maven dependency ordering for branch | | +1 | mvninstall | 655 | trunk passed | | +1 | compile | 415 | trunk passed | | +1 | checkstyle | 85 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 876 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 195 | trunk passed | | 0 | spotbugs | 508 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 767 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 29 | Maven dependency ordering for patch | | +1 | mvninstall | 611 | the patch passed | | +1 | compile | 410 | the patch passed | | +1 | javac | 410 | the patch passed | | +1 | checkstyle | 88 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | shellcheck | 0 | There were no new shellcheck issues. | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 771 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 205 | the patch passed | | +1 | findbugs | 756 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 351 | hadoop-hdds in the patch passed. | | -1 | unit | 2532 | hadoop-ozone in the patch failed. | | +1 | asflicense | 72 | The patch does not generate ASF License warnings. | | | | 9230 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion | | | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures | | | hadoop.ozone.om.TestOzoneManagerHA | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.scm.TestContainerSmallFile | | | hadoop.ozone.om.snapshot.TestOzoneManagerSnapshotProvider | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.client.rpc.TestBlockOutputStream | | | hadoop.ozone.om.TestOMRatisSnapshots | | | hadoop.ozone.TestSecureOzoneCluster | | | hadoop.ozone.scm.TestSCMContainerPlacementPolicyMetrics | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.ozone.container.TestContainerReplication | | | hadoop.ozone.client.rpc.TestContainerStateMachineFailures | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.2 Server=19.03.2 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1360/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1360 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle shellcheck shelldocs | | uname | Linux 36ee253cfa4b 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / dacc448 | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1360/3/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1360/3/testReport/ | | Max. process+thread count | 5296 (vs. ulimit of 5500) | | modules | C: hadoop-ozone/common hadoop-ozone/client hadoop-ozone/ozone-manager hadoop-ozone/dist hadoop-ozone/integration-test hadoop-ozone/ozone-recon hadoop-ozone/ozonefs U: hadoop-ozone | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1360/3/console | | versions | git=2.7.4 maven=3.3.9 shellcheck=0.4.6 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This
[jira] [Commented] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue
[ https://issues.apache.org/jira/browse/HDFS-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927282#comment-16927282 ] Hadoop QA commented on HDFS-14839: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 42s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 41s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 25 unchanged - 6 fixed = 25 total (was 31) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 44s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 9s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Exceptional return value of java.util.concurrent.BlockingQueue.offer(Object) ignored in org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlockToBeErasureCoded(ExtendedBlock, DatanodeDescriptor[], DatanodeStorageInfo[], byte[], ErasureCodingPolicy) At DatanodeDescriptor.java:ignored in org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlockToBeErasureCoded(ExtendedBlock, DatanodeDescriptor[], DatanodeStorageInfo[], byte[], ErasureCodingPolicy) At DatanodeDescriptor.java:[line 624] | | | Exceptional return value of java.util.concurrent.BlockingQueue.offer(Object) ignored in org.ap
[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold
[ https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927280#comment-16927280 ] Zhao Yi Ming commented on HDFS-14699: - [~marvelrock] This fix only make sure the numReplicas are correct, and the over-hardlimit srcNode will not be added into the srcNodes list, so the reconstruction work will NOT use any over-hardlimit srcNode. > Erasure Coding: Storage not considered in live replica when replication > streams hard limit reached to threshold > --- > > Key: HDFS-14699 > URL: https://issues.apache.org/jira/browse/HDFS-14699 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.2.0, 3.1.1, 3.3.0 >Reporter: Zhao Yi Ming >Assignee: Zhao Yi Ming >Priority: Critical > Labels: patch > Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, > HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, > HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, > image-2019-09-02-17-51-46-742.png > > > We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the > same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. > Following are our testing steps, hope it can helpful.(following DNs have the > testing internal blocks) > # we customized a new 10-2-1024k policy and use it on a path, now we have 12 > internal block(12 live block) > # decommission one DN, after the decommission complete. now we have 13 > internal block(12 live block and 1 decommission block) > # then shutdown one DN which did not have the same block id as 1 > decommission block, now we have 12 internal block(11 live block and 1 > decommission block) > # after wait for about 600s (before the heart beat come) commission the > decommissioned DN again, now we have 12 internal block(11 live block and 1 > duplicate block) > # Then the EC is not reconstruct the missed block > We think this is a critical issue for using the EC function in a production > env. Could you help? Thanks a lot! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927278#comment-16927278 ] Lokesh Jain commented on HDDS-1868: --- ContainerStateMachine already has an api called notifyLeader which notifies the state machine that the server has been elected as leader. We can use that api to trigger pipeline report from leader. For followers we will either need to add another api or leverage the notifyLeader to notify about elected leader to the follower datanode. This would require change in Ratis. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states
[ https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310317 ] ASF GitHub Bot logged work on HDDS-1982: Author: ASF GitHub Bot Created on: 11/Sep/19 05:46 Start Date: 11/Sep/19 05:46 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1344: HDDS-1982 Extend SCMNodeManager to support decommission and maintenance states URL: https://github.com/apache/hadoop/pull/1344#issuecomment-530229574 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 92 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 1 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 15 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 24 | Maven dependency ordering for branch | | +1 | mvninstall | 649 | trunk passed | | +1 | compile | 405 | trunk passed | | +1 | checkstyle | 76 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 932 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 182 | trunk passed | | 0 | spotbugs | 499 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 750 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 32 | Maven dependency ordering for patch | | -1 | mvninstall | 299 | hadoop-ozone in the patch failed. | | -1 | compile | 247 | hadoop-ozone in the patch failed. | | -1 | cc | 247 | hadoop-ozone in the patch failed. | | -1 | javac | 247 | hadoop-ozone in the patch failed. | | -0 | checkstyle | 43 | hadoop-ozone: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 739 | patch has no errors when building and testing our client artifacts. | | -1 | javadoc | 75 | hadoop-hdds generated 20 new + 16 unchanged - 0 fixed = 36 total (was 16) | | -1 | findbugs | 411 | hadoop-ozone in the patch failed. | ||| _ Other Tests _ | | +1 | unit | 339 | hadoop-hdds in the patch passed. | | -1 | unit | 466 | hadoop-ozone in the patch failed. | | +1 | asflicense | 40 | The patch does not generate ASF License warnings. | | | | 6574 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.2 Server=19.03.2 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1344 | | Optional Tests | dupname asflicense compile cc mvnsite javac unit javadoc mvninstall shadedclient findbugs checkstyle | | uname | Linux 2edae08c6f80 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / dacc448 | | Default Java | 1.8.0_212 | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-mvninstall-hadoop-ozone.txt | | compile | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-compile-hadoop-ozone.txt | | cc | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-compile-hadoop-ozone.txt | | javac | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-compile-hadoop-ozone.txt | | checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/diff-checkstyle-hadoop-ozone.txt | | javadoc | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/diff-javadoc-javadoc-hadoop-hdds.txt | | findbugs | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-findbugs-hadoop-ozone.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/testReport/ | | Max. process+thread count | 1247 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/common hadoop-hdds/server-scm hadoop-hdds/tools hadoop-ozone/integration-test U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message
[jira] [Commented] (HDFS-14840) Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927264#comment-16927264 ] Akira Ajisaka commented on HDFS-14840: -- LGTM, +1 > Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager > - > > Key: HDFS-14840 > URL: https://issues.apache.org/jira/browse/HDFS-14840 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14840.1.patch > > > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40 > Instead of synchronizing the entire class, just synchronize the collection. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927248#comment-16927248 ] Takanobu Asanuma edited comment on HDFS-14838 at 9/11/19 5:22 AM: -- +1 for HDFS-14838.001.patch, pending Jenkins. was (Author: tasanuma0829): +1 for HDFS-14838-1.patch, pending Jenkins. > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Labels: RBF, ui > Attachments: HDFS-14838-1.patch, HDFS-14838.001.patch, > HDFS-14838.patch, router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14838: -- Attachment: HDFS-14838.001.patch > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Labels: RBF, ui > Attachments: HDFS-14838-1.patch, HDFS-14838.001.patch, > HDFS-14838.patch, router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete
[ https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927259#comment-16927259 ] Siddharth Wagle commented on HDDS-1868: --- Do you think it makes sense to add a callback like _notifyLeader(RoleInfoProto roleInfoProto)_ to the RaftServerRpc? However, at the moment we don't seem to be calling into the state machine in _changeToLeader()_ in Ratis. Although, even the follower should also send a report. > Ozone pipelines should be marked as ready only after the leader election is > complete > > > Key: HDDS-1868 > URL: https://issues.apache.org/jira/browse/HDDS-1868 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch > > > Ozone pipeline on restart start in allocated state, they are moved into open > state after all the pipeline have reported to it. However this potentially > can lead into an issue where the pipeline is still not ready to accept any > incoming IO operations. > The pipelines should be marked as ready only after the leader election is > complete and leader is ready to accept incoming IO. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927253#comment-16927253 ] Surendra Singh Lilhore edited comment on HDFS-14768 at 9/11/19 4:56 AM: Thanks [~gjhkael]. {quote}you need run the code below, and you need check the block index 6 that recontruct on local path like {quote} UT should check the block corruption, we should not check it manually. If you are facing difficulties in writing UT, I can help you. Your fix idea LGTM, but you some fix is already taken care by HDFS-14699, so you need to wait for it and then need to rebase your patch. was (Author: surendrasingh): Thanks [~gjhkael]. {quote}you need run the code below, and you need check the block index 6 that recontruct on local path like {quote} UT should check the block corruption, we should not check it manually. If you are facing difficulties in writing UT, I can help you. you fix idea LGTM, but you some fix is already taken care by HDFS-14699, so you need to wait for it and then need to rebase your patch. > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(chec
[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927253#comment-16927253 ] Surendra Singh Lilhore commented on HDFS-14768: --- Thanks [~gjhkael]. {quote}you need run the code below, and you need check the block index 6 that recontruct on local path like {quote} UT should check the block corruption, we should not check it manually. If you are facing difficulties in writing UT, I can help you. you fix idea LGTM, but you some fix is already taken care by HDFS-14699, so you need to wait for it and then need to rebase your patch. > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.
[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold
[ https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927251#comment-16927251 ] Surendra Singh Lilhore commented on HDFS-14699: --- [~marvelrock], are you talking about HDFS-14768 or this jira ? > Erasure Coding: Storage not considered in live replica when replication > streams hard limit reached to threshold > --- > > Key: HDFS-14699 > URL: https://issues.apache.org/jira/browse/HDFS-14699 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.2.0, 3.1.1, 3.3.0 >Reporter: Zhao Yi Ming >Assignee: Zhao Yi Ming >Priority: Critical > Labels: patch > Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, > HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, > HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, > image-2019-09-02-17-51-46-742.png > > > We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the > same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. > Following are our testing steps, hope it can helpful.(following DNs have the > testing internal blocks) > # we customized a new 10-2-1024k policy and use it on a path, now we have 12 > internal block(12 live block) > # decommission one DN, after the decommission complete. now we have 13 > internal block(12 live block and 1 decommission block) > # then shutdown one DN which did not have the same block id as 1 > decommission block, now we have 12 internal block(11 live block and 1 > decommission block) > # after wait for about 600s (before the heart beat come) commission the > decommissioned DN again, now we have 12 internal block(11 live block and 1 > duplicate block) > # Then the EC is not reconstruct the missed block > We think this is a critical issue for using the EC function in a production > env. Could you help? Thanks a lot! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927248#comment-16927248 ] Takanobu Asanuma commented on HDFS-14838: - +1 for HDFS-14838-1.patch, pending Jenkins. > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Labels: RBF, ui > Attachments: HDFS-14838-1.patch, HDFS-14838.patch, router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14838: -- Attachment: HDFS-14838-1.patch > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Labels: RBF, ui > Attachments: HDFS-14838-1.patch, HDFS-14838.patch, router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927242#comment-16927242 ] CR Hota commented on HDFS-14609: [~zhangchen] Thanks for the ping and patch. After cancellation of token, we should try to renew and get InvalidToken exception. How do we validate InvalidToken exception test? Am i missing something? > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, > HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment
[ https://issues.apache.org/jira/browse/HDDS-2107?focusedWorklogId=310275&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310275 ] ASF GitHub Bot logged work on HDDS-2107: Author: ASF GitHub Bot Created on: 11/Sep/19 03:54 Start Date: 11/Sep/19 03:54 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1424: HDDS-2107. Datanodes should retry forever to connect to SCM in an… URL: https://github.com/apache/hadoop/pull/1424#issuecomment-530208630 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 41 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 67 | Maven dependency ordering for branch | | +1 | mvninstall | 589 | trunk passed | | +1 | compile | 381 | trunk passed | | +1 | checkstyle | 83 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 868 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 178 | trunk passed | | 0 | spotbugs | 417 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 615 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 41 | Maven dependency ordering for patch | | +1 | mvninstall | 536 | the patch passed | | +1 | compile | 387 | the patch passed | | +1 | javac | 387 | the patch passed | | +1 | checkstyle | 90 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 678 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 175 | the patch passed | | +1 | findbugs | 631 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 280 | hadoop-hdds in the patch failed. | | -1 | unit | 2824 | hadoop-ozone in the patch failed. | | +1 | asflicense | 55 | The patch does not generate ASF License warnings. | | | | 8715 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware | | | hadoop.ozone.container.TestContainerReplication | | | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient | | | hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion | | | hadoop.ozone.client.rpc.TestContainerStateMachineFailures | | | hadoop.ozone.client.rpc.Test2WayCommitInRatis | | | hadoop.ozone.TestSecureOzoneCluster | | | hadoop.ozone.scm.TestContainerSmallFile | | | hadoop.ozone.client.rpc.TestBlockOutputStream | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.om.TestOzoneManagerHA | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1424 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2f98f8163e51 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / f8f8598 | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/artifact/out/patch-unit-hadoop-hdds.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/testReport/ | | Max. process+thread count | 5408 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/container-service hadoop-ozone/ozone-manager U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was:
[jira] [Commented] (HDFS-14843) Double Synchronization in BlockReportLeaseManager
[ https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927225#comment-16927225 ] Hadoop QA commented on HDFS-14843: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 29s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 17s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14843 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980009/HDFS-14843.1.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 94a7316ce59a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f8f8598 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27838/testReport/ | | Max. process+thread count | 4121 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27838/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Double Synchronization in BlockReportLeaseM
[jira] [Resolved] (HDDS-1571) Create an interface for pipeline placement policy to support network topologies
[ https://issues.apache.org/jira/browse/HDDS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-1571. Fix Version/s: 0.5.0 Resolution: Fixed > Create an interface for pipeline placement policy to support network > topologies > --- > > Key: HDDS-1571 > URL: https://issues.apache.org/jira/browse/HDDS-1571 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Siddharth Wagle >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Leverage the work done in HDDS-700 for pipeline creation for open containers. > Create an interface that can provide different policy implementations for > pipeline creation. The default implementation should take into account no > topology information is configured. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14844) Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream configurable
[ https://issues.apache.org/jira/browse/HDFS-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14844: --- Description: details for HDFS-14820 > Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream > configurable > -- > > Key: HDFS-14844 > URL: https://issues.apache.org/jira/browse/HDFS-14844 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > > details for HDFS-14820 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14844) Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream configurable
Lisheng Sun created HDFS-14844: -- Summary: Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream configurable Key: HDFS-14844 URL: https://issues.apache.org/jira/browse/HDFS-14844 Project: Hadoop HDFS Issue Type: Improvement Reporter: Lisheng Sun Assignee: Lisheng Sun -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927208#comment-16927208 ] Lisheng Sun commented on HDFS-14795: Thanks [~elgoiri] for your suggestion. I fixed the checkstyle and uploaded the v009 patch. > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, > HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, > HDFS-14795.009.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927207#comment-16927207 ] Hadoop QA commented on HDFS-14838: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 34m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14838 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980011/HDFS-14838.patch | | Optional Tests | dupname asflicense shadedclient | | uname | Linux 31fbe6852f96 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 524b553 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 306 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27839/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Labels: RBF, ui > Attachments: HDFS-14838.patch, router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14795: --- Attachment: HDFS-14795.009.patch > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, > HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, > HDFS-14795.009.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14795: --- Attachment: (was: HDFS-14795.009.patch) > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, > HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14795: --- Attachment: HDFS-14795.009.patch > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, > HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, > HDFS-14795.009.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927195#comment-16927195 ] Xieming Li commented on HDFS-14838: --- [~tasanuma], Thank you for your review. I will rework on this issue and submit another patch soon. > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Labels: RBF, ui > Attachments: HDFS-14838.patch, router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14842) ByteArrayManager Reduce Synchronization
[ https://issues.apache.org/jira/browse/HDFS-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927194#comment-16927194 ] Hadoop QA commented on HDFS-14842: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14842 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980008/HDFS-14842.1.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 775d3032b349 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f8f8598 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27837/testReport/ | | Max. process+thread count | 415 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27837/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > ByteArrayManager Reduce
[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927192#comment-16927192 ] Takanobu Asanuma commented on HDFS-14838: - Thanks for working on this issue and submitting the patch, [~risyomei]. Since we want to keep the same interface between Router and NameNode, instead of using {{RouterId}}, it would be better to fix {{RBFMetrics#getHostAndPort()}}. > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Labels: RBF, ui > Attachments: HDFS-14838.patch, router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold
[ https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927177#comment-16927177 ] HuangTao edited comment on HDFS-14699 at 9/11/19 2:08 AM: -- {quote}3. then shutdown one DN which did not have the same block id as 1 decommission block, now we have 12 internal block(11 live block and 1 decommission block) 4. after wait for about 600s (before the heart beat come) commission the decommissioned DN again, now we have 12 internal block(11 live block and 1 duplicate block) {quote} {code:java} // src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:2314 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager#checkReplicaOnStorage {code} The above code has set the numReplicas. {code:java} if (!bitSet.get(blockIndex)) { bitSet.set(blockIndex); } else if (state == StoredReplicaState.LIVE) { numReplicas.subtract(StoredReplicaState.LIVE, 1); numReplicas.add(StoredReplicaState.REDUNDANT, 1); } {code} I think this block is used to recorrect the numReplicas when there are some Nodes being decommissioning, it has nothing with over-hardlimit. I think we should reconstruct the "1 decommission block" without the over-hardlimit srcNode, so I keep a doubt about this fix. was (Author: marvelrock): {quote}3. then shutdown one DN which did not have the same block id as 1 decommission block, now we have 12 internal block(11 live block and 1 decommission block) 4. after wait for about 600s (before the heart beat come) commission the decommissioned DN again, now we have 12 internal block(11 live block and 1 duplicate block) {quote} I think we should reconstruct the "1 decommission block" without the over-hardlimit srcNode, so I keep a doubt about this fix. > Erasure Coding: Storage not considered in live replica when replication > streams hard limit reached to threshold > --- > > Key: HDFS-14699 > URL: https://issues.apache.org/jira/browse/HDFS-14699 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.2.0, 3.1.1, 3.3.0 >Reporter: Zhao Yi Ming >Assignee: Zhao Yi Ming >Priority: Critical > Labels: patch > Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, > HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, > HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, > image-2019-09-02-17-51-46-742.png > > > We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the > same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. > Following are our testing steps, hope it can helpful.(following DNs have the > testing internal blocks) > # we customized a new 10-2-1024k policy and use it on a path, now we have 12 > internal block(12 live block) > # decommission one DN, after the decommission complete. now we have 13 > internal block(12 live block and 1 decommission block) > # then shutdown one DN which did not have the same block id as 1 > decommission block, now we have 12 internal block(11 live block and 1 > decommission block) > # after wait for about 600s (before the heart beat come) commission the > decommissioned DN again, now we have 12 internal block(11 live block and 1 > duplicate block) > # Then the EC is not reconstruct the missed block > We think this is a critical issue for using the EC function in a production > env. Could you help? Thanks a lot! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14838: -- Attachment: HDFS-14838.patch Labels: RBF ui (was: ) Status: Patch Available (was: Open) > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Labels: ui, RBF > Attachments: HDFS-14838.patch, router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky
[ https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927187#comment-16927187 ] Chen Zhang commented on HDFS-14811: --- [~ayushtkn] [~elgoiri], any comments? Thanks > RBF: TestRouterRpc#testErasureCoding is flaky > - > > Key: HDFS-14811 > URL: https://issues.apache.org/jira/browse/HDFS-14811 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch > > > The Failed reason: > {code:java} > 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [ > Node /default-rack/127.0.0.1:53148 [ > ] > Node /default-rack/127.0.0.1:53161 [ > ] > Node /default-rack/127.0.0.1:53157 [ > Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 > > 2.6665). > Node /default-rack/127.0.0.1:53143 [ > ] > Node /default-rack/127.0.0.1:53165 [ > ] > 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas > was chosen. Reason: {NODE_TOO_BUSY=1} > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough > replicas, still in need of 1 to reach 6 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) > - Failed to place enough replicas: expected size is 1 but only 0 storage > types can be selected (replication=6, selected=[], unavailable=[DISK], > removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough > replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All > required storage types are unavailable: unavailableStorages=[DISK], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO > ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default > port 53140, call Call#1270 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202 > java.io.IOException: File /testec/testfile2 could only be written to 5 of the > 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 > node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2921) > 2019-09-01 18:19:20,942 [IPC Server handler 6 on default port 53197] INFO > ipc.Server (Server.java:logException(2975)) - IPC Server handler 6 on default > port 53197, call Call#1268 Retry#
[jira] [Commented] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
[ https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927184#comment-16927184 ] angerszhu commented on HDFS-10943: -- [~swingcong] See my explain, https://issues.apache.org/jira/browse/HDFS-14437 [~hexiaoqiao] Seem the same reason. > rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed > -- > > Key: HDFS-10943 > URL: https://issues.apache.org/jira/browse/HDFS-10943 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Major > > Per the following trace stack: > {code} > FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 10562075963, 10562174157 failed for required journal > (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, > 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid > 10562075963)) > java.io.IOException: FSEditStream has 49708 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1172) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6437) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) > at > org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > 2016-09-23 21:40:59,618 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting > QuorumOutputStream starting at txid 10562075963 > {code} > The exception is from EditsDoubleBuffer > {code} > public void close() throws IOException { > Preconditions.checkNotNull(bufCurrent); > Preconditions.checkNotNull(bufReady); > int bufSize = bufCurrent.size(); > if (bufSize != 0) { > throw new IOException("FSEditStream has " + bufSize > + " bytes still to be flushed and cannot be closed."); > } > IOUtils.cleanup(null, bufCurrent, bufReady); > bufCurrent = bufReady = null; > } > {code} > We can see that FSNamesystem.rollEditLog expects > EditsDoubleBuffer.bufCurrent to be empty. > Edits are recorded via FSEditLog$logSync, which does: > {code} >* The data is double-buffered within each edit log implementation so that >* in-memory writing can occur in parallel with the on-disk writing. >* >* Each sync occurs in three steps: >* 1. synchronized, it swaps the double buffer and sets the isSyncRunning >* flag. >* 2. unsynchronized, it flushes the data to storage >* 3. synchronized, it resets the flag and notifies anyone waiting on the >* sync. >* >* The lack of synchronization on step 2 allows other threads to continue >* to write into the memory buffer while the sync is in progress. >* Because this step is unsynchronized, actions that need to avoid >* concurrency with sync() should be synchronized and also call >* waitForSyncToFinish() before assuming they are running alone. >*/ > {code} > We can see that step
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927185#comment-16927185 ] Chen Zhang commented on HDFS-14609: --- Ping [~crh] [~eyang]... Could you help to take a look, thanks! > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, > HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927183#comment-16927183 ] HuangTao commented on HDFS-14768: - [~zhaoyim] you can change the `incrementPendingReplicationWithoutTargets()` access from default to public > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: h
[jira] [Commented] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
[ https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927181#comment-16927181 ] wangcong commented on HDFS-10943: - Sorry,[~hexiaoqiao],the version we use is 2.6.0-cdh5.10.0. Through looking log of HDFS-11292,we found the problem as follow: If roll edit run normally, the log shows : 2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: logSyncAll toSyncToTxId=5060982534 lastSyncedTxid=5060982511 mostRecentTxid=5060982534 2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Done logSyncAll lastWrittenTxId=5060982534 lastSyncedTxid=5060982534 mostRecentTxid=5060982534 toSyncToTxId in the firstline is the txId of EndLogSegmentOp,which is the last log of editlog,is equal to lastWrittenTxId in the secondline. This shows after EndLogSegmentOp,there is nothing to write to double buffer. but if roll edit run abnormally,the log shows : 2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: logSyncAll toSyncToTxId=5061382825 lastSyncedTxid=5061371306 mostRecentTxid=5061382825 2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Done logSyncAll lastWrittenTxId=5061382841 lastSyncedTxid=5061382840 mostRecentTxid=5061382841 toSyncToTxId in the firstline is not equal to lastWrittenTxId in the secondline,this shows after EndLogSegmentOp,another handler writer log to double buffer. In the secondline,lastWrittenTxId is not equal to lastSyncedTxid, which shows current buf is not empty in double buffer. > rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed > -- > > Key: HDFS-10943 > URL: https://issues.apache.org/jira/browse/HDFS-10943 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Priority: Major > > Per the following trace stack: > {code} > FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 10562075963, 10562174157 failed for required journal > (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, > 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid > 10562075963)) > java.io.IOException: FSEditStream has 49708 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1172) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6437) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) > at > org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > 2016-09-23 21:40:59,618 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting > QuorumOutputStream starting at txid 10562075963 > {code} > The exception is from EditsDoubleBuffer > {code} > public void close() throws IOException { > Preconditions.checkNotNull(bufCurrent); > Preconditions.checkNotNull(bufReady); > int bufSize = bufCurrent.
[jira] [Comment Edited] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold
[ https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927177#comment-16927177 ] HuangTao edited comment on HDFS-14699 at 9/11/19 1:40 AM: -- {quote}3. then shutdown one DN which did not have the same block id as 1 decommission block, now we have 12 internal block(11 live block and 1 decommission block) 4. after wait for about 600s (before the heart beat come) commission the decommissioned DN again, now we have 12 internal block(11 live block and 1 duplicate block) {quote} I think we should reconstruct the "1 decommission block" without the over-hardlimit srcNode, so I keep a doubt about this fix. was (Author: marvelrock): I have a doubt about "after wait for about 600s (before the heart beat come) commission the decommissioned DN again, now we have 12 internal block(11 live block and 1 duplicate block)" > Erasure Coding: Storage not considered in live replica when replication > streams hard limit reached to threshold > --- > > Key: HDFS-14699 > URL: https://issues.apache.org/jira/browse/HDFS-14699 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.2.0, 3.1.1, 3.3.0 >Reporter: Zhao Yi Ming >Assignee: Zhao Yi Ming >Priority: Critical > Labels: patch > Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, > HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, > HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, > image-2019-09-02-17-51-46-742.png > > > We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the > same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. > Following are our testing steps, hope it can helpful.(following DNs have the > testing internal blocks) > # we customized a new 10-2-1024k policy and use it on a path, now we have 12 > internal block(12 live block) > # decommission one DN, after the decommission complete. now we have 13 > internal block(12 live block and 1 decommission block) > # then shutdown one DN which did not have the same block id as 1 > decommission block, now we have 12 internal block(11 live block and 1 > decommission block) > # after wait for about 600s (before the heart beat come) commission the > decommissioned DN again, now we have 12 internal block(11 live block and 1 > duplicate block) > # Then the EC is not reconstruct the missed block > We think this is a critical issue for using the EC function in a production > env. Could you help? Thanks a lot! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14835) RBF: Secured Router should not run when it can't initialize DelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-14835: Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Merged the PR to trunk. Thanks again for your reviews, [~crh] and [~elgoiri]. > RBF: Secured Router should not run when it can't initialize > DelegationTokenSecretManager > > > Key: HDFS-14835 > URL: https://issues.apache.org/jira/browse/HDFS-14835 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: RBF > Fix For: 3.3.0 > > > Currently, even if secured router fails to create > DelegationTokenSecretManager, it can start and keeps running. This router > can't handle requests with delegation tokens. > {noformat} > ERROR org.apache.hadoop.hdfs.server.federation.router.FederationUtil: Could > not instantiate: ZKDelegationTokenSecretManagerImpl > ... > INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port > {noformat} > In this case, I think router should not start. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14835) RBF: Secured Router should not run when it can't initialize DelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927178#comment-16927178 ] Hudson commented on HDFS-14835: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17272 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17272/]) HDFS-14835. RBF: Secured Router should not run when it can't initialize (github: rev 524b553a5f1c10bf41c723302cf42b592ffa1631) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/security/TestRouterSecurityManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/RouterSecurityManager.java > RBF: Secured Router should not run when it can't initialize > DelegationTokenSecretManager > > > Key: HDFS-14835 > URL: https://issues.apache.org/jira/browse/HDFS-14835 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: RBF > > Currently, even if secured router fails to create > DelegationTokenSecretManager, it can start and keeps running. This router > can't handle requests with delegation tokens. > {noformat} > ERROR org.apache.hadoop.hdfs.server.federation.router.FederationUtil: Could > not instantiate: ZKDelegationTokenSecretManagerImpl > ... > INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port > {noformat} > In this case, I think router should not start. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold
[ https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927177#comment-16927177 ] HuangTao commented on HDFS-14699: - I have a doubt about "after wait for about 600s (before the heart beat come) commission the decommissioned DN again, now we have 12 internal block(11 live block and 1 duplicate block)" > Erasure Coding: Storage not considered in live replica when replication > streams hard limit reached to threshold > --- > > Key: HDFS-14699 > URL: https://issues.apache.org/jira/browse/HDFS-14699 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.2.0, 3.1.1, 3.3.0 >Reporter: Zhao Yi Ming >Assignee: Zhao Yi Ming >Priority: Critical > Labels: patch > Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, > HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, > HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, > image-2019-09-02-17-51-46-742.png > > > We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the > same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. > Following are our testing steps, hope it can helpful.(following DNs have the > testing internal blocks) > # we customized a new 10-2-1024k policy and use it on a path, now we have 12 > internal block(12 live block) > # decommission one DN, after the decommission complete. now we have 13 > internal block(12 live block and 1 decommission block) > # then shutdown one DN which did not have the same block id as 1 > decommission block, now we have 12 internal block(11 live block and 1 > decommission block) > # after wait for about 600s (before the heart beat come) commission the > decommissioned DN again, now we have 12 internal block(11 live block and 1 > duplicate block) > # Then the EC is not reconstruct the missed block > We think this is a critical issue for using the EC function in a production > env. Could you help? Thanks a lot! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment
[ https://issues.apache.org/jira/browse/HDDS-2107?focusedWorklogId=310242&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310242 ] ASF GitHub Bot logged work on HDDS-2107: Author: ASF GitHub Bot Created on: 11/Sep/19 01:28 Start Date: 11/Sep/19 01:28 Worklog Time Spent: 10m Work Description: vivekratnavel commented on issue #1424: HDDS-2107. Datanodes should retry forever to connect to SCM in an… URL: https://github.com/apache/hadoop/pull/1424#issuecomment-530181099 @xiaoyuyao @hanishakoneru @anuengineer @elek @bharatviswa504 Please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310242) Time Spent: 0.5h (was: 20m) > Datanodes should retry forever to connect to SCM in an unsecure environment > --- > > Key: HDDS-2107 > URL: https://issues.apache.org/jira/browse/HDDS-2107 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.1 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In an unsecure environment, the datanodes try upto 10 times after waiting for > 1000 milliseconds each time before throwing this error: > {code:java} > Unable to communicate to SCM server at scm:9861 for past 0 seconds. > java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy33.getVersion(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794) > at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > ... 13 more > {code} > The datanodes should try forever to connect with SCM and not fail immediately > after 10 retries. -- This message was sent
[jira] [Work logged] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment
[ https://issues.apache.org/jira/browse/HDDS-2107?focusedWorklogId=310240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310240 ] ASF GitHub Bot logged work on HDDS-2107: Author: ASF GitHub Bot Created on: 11/Sep/19 01:28 Start Date: 11/Sep/19 01:28 Worklog Time Spent: 10m Work Description: vivekratnavel commented on issue #1424: HDDS-2107. Datanodes should retry forever to connect to SCM in an… URL: https://github.com/apache/hadoop/pull/1424#issuecomment-530180967 /label ozone This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310240) Time Spent: 20m (was: 10m) > Datanodes should retry forever to connect to SCM in an unsecure environment > --- > > Key: HDDS-2107 > URL: https://issues.apache.org/jira/browse/HDDS-2107 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.1 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > In an unsecure environment, the datanodes try upto 10 times after waiting for > 1000 milliseconds each time before throwing this error: > {code:java} > Unable to communicate to SCM server at scm:9861 for past 0 seconds. > java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy33.getVersion(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794) > at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > ... 13 more > {code} > The datanodes should try forever to connect with SCM and not fail immediately > after 10 retries. -- This message was sent by Atlassian Jira (v8.3.2#803003) -
[jira] [Updated] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment
[ https://issues.apache.org/jira/browse/HDDS-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2107: - Labels: pull-request-available (was: ) > Datanodes should retry forever to connect to SCM in an unsecure environment > --- > > Key: HDDS-2107 > URL: https://issues.apache.org/jira/browse/HDDS-2107 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.1 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > > In an unsecure environment, the datanodes try upto 10 times after waiting for > 1000 milliseconds each time before throwing this error: > {code:java} > Unable to communicate to SCM server at scm:9861 for past 0 seconds. > java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy33.getVersion(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794) > at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > ... 13 more > {code} > The datanodes should try forever to connect with SCM and not fail immediately > after 10 retries. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment
[ https://issues.apache.org/jira/browse/HDDS-2107?focusedWorklogId=310239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310239 ] ASF GitHub Bot logged work on HDDS-2107: Author: ASF GitHub Bot Created on: 11/Sep/19 01:27 Start Date: 11/Sep/19 01:27 Worklog Time Spent: 10m Work Description: vivekratnavel commented on pull request #1424: HDDS-2107. Datanodes should retry forever to connect to SCM in an… URL: https://github.com/apache/hadoop/pull/1424 … unsecure environment In an unsecure environment, the datanodes try upto 10 times after waiting for 1000 milliseconds each time before throwing this error: ```Unable to communicate to SCM server at scm:9861 for past 0 seconds. java.net.ConnectException: Call From scm:9861 failed on connection exception: java.net.ConnectException: Connection refused;``` This PR fixes that issue by having datanodes try forever to connect with SCM and not fail immediately after 10 retries. I have also increased timeouts on a unit test to improve its stability. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310239) Remaining Estimate: 0h Time Spent: 10m > Datanodes should retry forever to connect to SCM in an unsecure environment > --- > > Key: HDDS-2107 > URL: https://issues.apache.org/jira/browse/HDDS-2107 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.1 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In an unsecure environment, the datanodes try upto 10 times after waiting for > 1000 milliseconds each time before throwing this error: > {code:java} > Unable to communicate to SCM server at scm:9861 for past 0 seconds. > java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on > connection exception: java.net.ConnectException: Connection refused; For more > details see: http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy33.getVersion(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at > org.apa
[jira] [Updated] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment
[ https://issues.apache.org/jira/browse/HDDS-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vivek Ratnavel Subramanian updated HDDS-2107: - Description: In an unsecure environment, the datanodes try upto 10 times after waiting for 1000 milliseconds each time before throwing this error: {code:java} Unable to communicate to SCM server at scm:9861 for past 0 seconds. java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) at org.apache.hadoop.ipc.Client.call(Client.java:1457) at org.apache.hadoop.ipc.Client.call(Client.java:1367) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy33.getVersion(Unknown Source) at org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794) at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572) at org.apache.hadoop.ipc.Client.call(Client.java:1403) ... 13 more {code} The datanodes should try forever to connect with SCM and not fail immediately after 10 retries. was: In an unsecure environment, the datanodes try upto 10 times after waiting for 1000 milliseconds each time before throwing this error: {code:java} Unable to communicate to SCM server at jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 for past 0 seconds. java.net.ConnectException: Call From jmccarthy-ozone-unsecure2-4.vpc.cloudera.com/10.65.36.118 to jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) at org.apache.hadoop.ipc.Client.call(Client.java:1457) at org.apache.hadoop.ipc.Client.call(Client.java:1367) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy33.getVersion(Unknown Source) at org.apache.hadoop.ozone.protocolPB.StorageC
[jira] [Updated] (HDFS-14843) Double Synchronization in BlockReportLeaseManager
[ https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14843: -- Status: Patch Available (was: Open) > Double Synchronization in BlockReportLeaseManager > - > > Key: HDFS-14843 > URL: https://issues.apache.org/jira/browse/HDFS-14843 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14843.1.patch > > > {code:java|title=BlockReportLeaseManager.java} > private synchronized long getNextId() { > long id; > do { > id = nextId++; > } while (id == 0); > return id; > } > {code} > This is a private method and is synchronized, however, it is only be accessed > from an already-synchronized method. No need to double-synchronize. > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189 > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14843) Double Synchronization in BlockReportLeaseManager
[ https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14843: -- Attachment: HDFS-14843.1.patch > Double Synchronization in BlockReportLeaseManager > - > > Key: HDFS-14843 > URL: https://issues.apache.org/jira/browse/HDFS-14843 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14843.1.patch > > > {code:java|title=BlockReportLeaseManager.java} > private synchronized long getNextId() { > long id; > do { > id = nextId++; > } while (id == 0); > return id; > } > {code} > This is a private method and is synchronized, however, it is only be accessed > from an already-synchronized method. No need to double-synchronize. > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189 > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14843) Double Synchronization in BlockReportLeaseManager
[ https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HDFS-14843: - Assignee: David Mollitor > Double Synchronization in BlockReportLeaseManager > - > > Key: HDFS-14843 > URL: https://issues.apache.org/jira/browse/HDFS-14843 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > {code:java|title=BlockReportLeaseManager.java} > private synchronized long getNextId() { > long id; > do { > id = nextId++; > } while (id == 0); > return id; > } > {code} > This is a private method and is synchronized, however, it is only be accessed > from an already-synchronized method. No need to double-synchronize. > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189 > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14843) Double Synchronization in BlockReportLeaseManager
David Mollitor created HDFS-14843: - Summary: Double Synchronization in BlockReportLeaseManager Key: HDFS-14843 URL: https://issues.apache.org/jira/browse/HDFS-14843 Project: Hadoop HDFS Issue Type: Improvement Reporter: David Mollitor {code:java|title=BlockReportLeaseManager.java} private synchronized long getNextId() { long id; do { id = nextId++; } while (id == 0); return id; } {code} This is a private method and is synchronized, however, it is only be accessed from an already-synchronized method. No need to double-synchronize. https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189 https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14842) ByteArrayManager Reduce Synchronization
[ https://issues.apache.org/jira/browse/HDFS-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14842: -- Status: Patch Available (was: Open) > ByteArrayManager Reduce Synchronization > --- > > Key: HDFS-14842 > URL: https://issues.apache.org/jira/browse/HDFS-14842 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14842.1.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14842) ByteArrayManager Reduce Synchronization
[ https://issues.apache.org/jira/browse/HDFS-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14842: -- Attachment: HDFS-14842.1.patch > ByteArrayManager Reduce Synchronization > --- > > Key: HDFS-14842 > URL: https://issues.apache.org/jira/browse/HDFS-14842 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14842.1.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927168#comment-16927168 ] Xieming Li commented on HDFS-14838: --- Hi, [~elgoiri], Thank you for the confirmation, I will submit a patch soon. > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Attachments: router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14842) ByteArrayManager Reduce Synchronization
David Mollitor created HDFS-14842: - Summary: ByteArrayManager Reduce Synchronization Key: HDFS-14842 URL: https://issues.apache.org/jira/browse/HDFS-14842 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14840) Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927144#comment-16927144 ] Hadoop QA commented on HDFS-14840: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 0s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 7 unchanged - 1 fixed = 7 total (was 8) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 44s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}199m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.0 Server=19.03.0 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14840 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979993/HDFS-14840.1.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0761209e5eb3 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 10144a5 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/27835/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27835/testReport/ | | Max. process+thread count | 2626 (vs. ulimit of 5500) | | modules | C
[jira] [Commented] (HDFS-14841) Remove Class-Level Synchronization in LocalReplica
[ https://issues.apache.org/jira/browse/HDFS-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927138#comment-16927138 ] Hadoop QA commented on HDFS-14841: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 20m 31s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 45s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 7 unchanged - 4 fixed = 9 total (was 11) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 23s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}146m 31s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController | | | hadoop.hdfs.TestMultipleNNPortQOP | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14841 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/1297/HDFS-14841.1.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 49c750562be7 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 10144a5 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | mvninstall | https://builds.apache.org/job/PreCommit-HDFS-Build/27836/artifact/out/branch-mvninstall-root.txt | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/27836/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.
[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids
[ https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310172 ] ASF GitHub Bot logged work on HDDS-2007: Author: ASF GitHub Bot Created on: 10/Sep/19 23:20 Start Date: 10/Sep/19 23:20 Worklog Time Spent: 10m Work Description: smengcl commented on issue #1360: HDDS-2007. Make ozone fs shell command work with OM HA service ids URL: https://github.com/apache/hadoop/pull/1360#issuecomment-530156703 @bharatviswa504 The previous commit passed all acceptance and unit. The latest commit shouldn't cause those failures. I'll trigger a retest to check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310172) Time Spent: 5h 20m (was: 5h 10m) > Make ozone fs shell command work with OM HA service ids > --- > > Key: HDDS-2007 > URL: https://issues.apache.org/jira/browse/HDDS-2007 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > > Build an HDFS HA-like nameservice for OM HA so that Ozone client can access > Ozone HA cluster with ease. > The majority of the work is already done in HDDS-972. But the problem is that > the client would crash if there are more than one service ids > (ozone.om.service.ids) configured in ozone-site.xml. This need to be address > on client side. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids
[ https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310173&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310173 ] ASF GitHub Bot logged work on HDDS-2007: Author: ASF GitHub Bot Created on: 10/Sep/19 23:20 Start Date: 10/Sep/19 23:20 Worklog Time Spent: 10m Work Description: smengcl commented on issue #1360: HDDS-2007. Make ozone fs shell command work with OM HA service ids URL: https://github.com/apache/hadoop/pull/1360#issuecomment-530156719 /retest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310173) Time Spent: 5.5h (was: 5h 20m) > Make ozone fs shell command work with OM HA service ids > --- > > Key: HDDS-2007 > URL: https://issues.apache.org/jira/browse/HDDS-2007 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 5.5h > Remaining Estimate: 0h > > Build an HDFS HA-like nameservice for OM HA so that Ozone client can access > Ozone HA cluster with ease. > The majority of the work is already done in HDDS-972. But the problem is that > the client would crash if there are more than one service ids > (ozone.om.service.ids) configured in ozone-site.xml. This need to be address > on client side. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14837) Review of Block.java
[ https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927106#comment-16927106 ] Íñigo Goiri commented on HDFS-14837: Can we use EqualsBuilder and HashCodeBuilder? I would also complete the javadocs instead of removing them. For example for getBlockName() I would put an example of blk_X. I always want to know which one is the one with the timestamp or the one without; the javadoc would clarify that. > Review of Block.java > > > Key: HDFS-14837 > URL: https://issues.apache.org/jira/browse/HDFS-14837 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14837.1.patch > > > The {{Block}} class is such a core class in the project, I just wanted to > make sure it was super clean and documentation was correct. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica
[ https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927107#comment-16927107 ] Siyao Meng commented on HDFS-14283: --- [~leosun08] Thanks! > DFSInputStream to prefer cached replica > --- > > Key: HDFS-14283 > URL: https://issues.apache.org/jira/browse/HDFS-14283 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 > Environment: HDFS Caching >Reporter: Wei-Chiu Chuang >Assignee: Lisheng Sun >Priority: Major > > HDFS Caching offers performance benefits. However, currently NameNode does > not treat cached replica with higher priority, so HDFS caching is only useful > when cache replication = 3, that is to say, all replicas are cached in > memory, so that a client doesn't randomly pick an uncached replica. > HDFS-6846 proposed to let NameNode give higher priority to cached replica. > Changing a logic in NameNode is always tricky so that didn't get much > traction. Here I propose a different approach: let client (DFSInputStream) > prefer cached replica. > A {{LocatedBlock}} object already contains cached replica location so a > client has the needed information. I think we can change > {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927104#comment-16927104 ] Íñigo Goiri commented on HDFS-14795: Let's fix the checkstyle and make the new methods static. > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, > HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1873) Recon should store last successful run timestamp for each task
[ https://issues.apache.org/jira/browse/HDDS-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle reassigned HDDS-1873: - Assignee: Shweta (was: Aravindan Vijayan) > Recon should store last successful run timestamp for each task > -- > > Key: HDDS-1873 > URL: https://issues.apache.org/jira/browse/HDDS-1873 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Affects Versions: 0.4.1 >Reporter: Vivek Ratnavel Subramanian >Assignee: Shweta >Priority: Major > > Recon should store last ozone manager snapshot received timestamp along with > timestamps of last successful run for each task. > This is important to give users a sense of how latest the current data that > they are looking at is. And, we need this per task because some tasks might > fail to run or might take much longer time to run than other tasks and this > needs to be reflected in the UI for better and consistent user experience. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6524) Choosing datanode retries times considering with block replica number
[ https://issues.apache.org/jira/browse/HDFS-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927103#comment-16927103 ] Íñigo Goiri commented on HDFS-6524: --- Is there any unit test we can add to cover the new case? BTW, that if is getting a little unreadable. > Choosing datanode retries times considering with block replica number > -- > > Key: HDFS-6524 > URL: https://issues.apache.org/jira/browse/HDFS-6524 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0-alpha1 >Reporter: Liang Xie >Assignee: Lisheng Sun >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-6524.001.patch, HDFS-6524.002.patch, HDFS-6524.txt > > > Currently the chooseDataNode() does retry with the setting: > dfsClientConf.maxBlockAcquireFailures, which by default is 3 > (DFS_CLIENT_MAX_BLOCK_ACQUIRE_FAILURES_DEFAULT = 3), it would be better > having another option, block replication factor. One cluster with only two > block replica setting, or using Reed-solomon encoding solution with one > replica factor. It helps to reduce the long tail latency. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue
[ https://issues.apache.org/jira/browse/HDFS-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927101#comment-16927101 ] Íñigo Goiri commented on HDFS-14839: Yetus does not look very happy. > Use Java Concurrent BlockingQueue instead of Internal BlockQueue > > > Key: HDFS-14839 > URL: https://issues.apache.org/jira/browse/HDFS-14839 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14839.1.patch > > > Replace... > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L86 > With... > https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue
[ https://issues.apache.org/jira/browse/HDFS-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927099#comment-16927099 ] Hadoop QA commented on HDFS-14839: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 25 unchanged - 6 fixed = 25 total (was 31) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 43s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}128m 39s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}202m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Exceptional return value of java.util.concurrent.BlockingQueue.offer(Object) ignored in org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlockToBeErasureCoded(ExtendedBlock, DatanodeDescriptor[], DatanodeStorageInfo[], byte[], ErasureCodingPolicy) At DatanodeDescriptor.java:ignored in org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlockToBeErasureCoded(ExtendedBlock, DatanodeDescriptor[], DatanodeStorageInfo[], byte[], ErasureCodingPolicy) At DatanodeDescriptor.java:[line 624] | | | Exceptional return value of java.util.concurrent.BlockingQueue.offer(Object) ignored in org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlockToBeRecovered(BlockInfo) At DatanodeDescriptor.java:ignored in org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlockToBeRecovered(BlockInfo) At DatanodeDescriptor.java:[line 638] | | | Exceptional return value of java.util.concurrent
[jira] [Commented] (HDFS-14841) Remove Class-Level Synchronization in LocalReplica
[ https://issues.apache.org/jira/browse/HDFS-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927098#comment-16927098 ] Íñigo Goiri commented on HDFS-14841: Do you have any performance result or pointer? It makes sense but it's better to have it for completion. Can we avoid the parenthesis around key? Not sure if lambdas makes it necessary. > Remove Class-Level Synchronization in LocalReplica > -- > > Key: HDFS-14841 > URL: https://issues.apache.org/jira/browse/HDFS-14841 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14841.1.patch > > > Use Java's Concurrent package instead. > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/LocalReplica.java#L143 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927089#comment-16927089 ] Íñigo Goiri commented on HDFS-14090: Right now, the test is doing a catch of the exception and then calling fail() which triggers an assert exception. We can just let the exception surface without the fail. Finally, when doing the get in the future we can catch the exception and expose the actual exception. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927084#comment-16927084 ] Íñigo Goiri commented on HDFS-14838: Thanks [~risyomei], it makes sense to make it consistent with the NN and the DN. Do you mind submitting a patch? > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Attachments: router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
[ https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-14838: --- Summary: RBF: Display RPC (instead of HTTP) Port Number in RBF web UI (was: Display RPC (instead of HTTP) Port Number in RBF web UI.) > RBF: Display RPC (instead of HTTP) Port Number in RBF web UI > > > Key: HDFS-14838 > URL: https://issues.apache.org/jira/browse/HDFS-14838 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, ui >Affects Versions: 3.1.2 >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Minor > Attachments: router-ui.jpg > > > Currently The WebUI of RBF is using : in its heading. > It should be changed to : as the WebUI of NameNode and > DataNode do. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment
[ https://issues.apache.org/jira/browse/HDDS-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2107 started by Vivek Ratnavel Subramanian. > Datanodes should retry forever to connect to SCM in an unsecure environment > --- > > Key: HDDS-2107 > URL: https://issues.apache.org/jira/browse/HDDS-2107 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.1 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > > In an unsecure environment, the datanodes try upto 10 times after waiting for > 1000 milliseconds each time before throwing this error: > {code:java} > Unable to communicate to SCM server at > jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 for past 0 seconds. > java.net.ConnectException: Call From > jmccarthy-ozone-unsecure2-4.vpc.cloudera.com/10.65.36.118 to > jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 failed on connection > exception: java.net.ConnectException: Connection refused; For more details > see: http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy33.getVersion(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794) > at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > ... 13 more > {code} > The datanodes should try forever to connect with SCM and not fail immediately > after 10 retries. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2089) Add CLI createPipeline
[ https://issues.apache.org/jira/browse/HDDS-2089?focusedWorklogId=310155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310155 ] ASF GitHub Bot logged work on HDDS-2089: Author: ASF GitHub Bot Created on: 10/Sep/19 22:20 Start Date: 10/Sep/19 22:20 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #1418: HDDS-2089: Add createPipeline CLI. URL: https://github.com/apache/hadoop/pull/1418#discussion_r322987593 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java ## @@ -390,10 +390,9 @@ public void notifyObjectStageChange(StorageContainerLocationProtocolProtos public Pipeline createReplicationPipeline(HddsProtos.ReplicationType type, HddsProtos.ReplicationFactor factor, HddsProtos.NodePool nodePool) throws IOException { -// TODO: will be addressed in future patch. -// This is needed only for debugging purposes to make sure cluster is -// working correctly. -return null; +AUDIT.logReadSuccess( Review comment: Should we log this as a write success for pipeline creation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310155) Time Spent: 0.5h (was: 20m) > Add CLI createPipeline > -- > > Key: HDDS-2089 > URL: https://issues.apache.org/jira/browse/HDDS-2089 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone CLI >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Add a SCMCLI to create pipeline for ozone. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids
[ https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310154&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310154 ] ASF GitHub Bot logged work on HDDS-2007: Author: ASF GitHub Bot Created on: 10/Sep/19 22:16 Start Date: 10/Sep/19 22:16 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on issue #1360: HDDS-2007. Make ozone fs shell command work with OM HA service ids URL: https://github.com/apache/hadoop/pull/1360#issuecomment-530141859 +1. Can you check if the acceptance tests failed are related? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310154) Time Spent: 5h 10m (was: 5h) > Make ozone fs shell command work with OM HA service ids > --- > > Key: HDDS-2007 > URL: https://issues.apache.org/jira/browse/HDDS-2007 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > > Build an HDFS HA-like nameservice for OM HA so that Ozone client can access > Ozone HA cluster with ease. > The majority of the work is already done in HDDS-972. But the problem is that > the client would crash if there are more than one service ids > (ozone.om.service.ids) configured in ozone-site.xml. This need to be address > on client side. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1879) Support multiple excluded scopes when choosing datanodes in NetworkTopology
[ https://issues.apache.org/jira/browse/HDDS-1879?focusedWorklogId=310153&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310153 ] ASF GitHub Bot logged work on HDDS-1879: Author: ASF GitHub Bot Created on: 10/Sep/19 22:14 Start Date: 10/Sep/19 22:14 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on issue #1194: HDDS-1879. Support multiple excluded scopes when choosing datanodes in NetworkTopology URL: https://github.com/apache/hadoop/pull/1194#issuecomment-530141215 Thanks @ChenSammi for updating the PR. The latest change LGTM. Can you fix the checkstyle and unit test failure that seems to be related? +1 after that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310153) Time Spent: 3h 50m (was: 3h 40m) > Support multiple excluded scopes when choosing datanodes in NetworkTopology > --- > > Key: HDDS-1879 > URL: https://issues.apache.org/jira/browse/HDDS-1879 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14841) Remove Class-Level Synchronization in LocalReplica
[ https://issues.apache.org/jira/browse/HDFS-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14841: -- Attachment: HDFS-14841.1.patch > Remove Class-Level Synchronization in LocalReplica > -- > > Key: HDFS-14841 > URL: https://issues.apache.org/jira/browse/HDFS-14841 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14841.1.patch > > > Use Java's Concurrent package instead. > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/LocalReplica.java#L143 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14841) Remove Class-Level Synchronization in LocalReplica
[ https://issues.apache.org/jira/browse/HDFS-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14841: -- Status: Patch Available (was: Open) > Remove Class-Level Synchronization in LocalReplica > -- > > Key: HDFS-14841 > URL: https://issues.apache.org/jira/browse/HDFS-14841 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14841.1.patch > > > Use Java's Concurrent package instead. > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/LocalReplica.java#L143 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14841) Remove Class-Level Synchronization in LocalReplica
David Mollitor created HDFS-14841: - Summary: Remove Class-Level Synchronization in LocalReplica Key: HDFS-14841 URL: https://issues.apache.org/jira/browse/HDFS-14841 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Use Java's Concurrent package instead. https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/LocalReplica.java#L143 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2041) Don't depend on DFSUtil to check HTTP policy
[ https://issues.apache.org/jira/browse/HDDS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2041 started by Vivek Ratnavel Subramanian. > Don't depend on DFSUtil to check HTTP policy > > > Key: HDDS-2041 > URL: https://issues.apache.org/jira/browse/HDDS-2041 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: website >Affects Versions: 0.4.1 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > > Currently, BaseHttpServer uses DFSUtil to get Http policy. With this, when > http policy is set to HTTPS on hdfs-site.xml, ozone http servers try to come > up with HTTPS and fail if SSL certificates are not present in the required > location. > Ozone web UIs should not depend on HDFS config to determine HTTP policy. > Instead, it should have its own config to determine the policy. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment
[ https://issues.apache.org/jira/browse/HDDS-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927049#comment-16927049 ] Vivek Ratnavel Subramanian commented on HDDS-2107: -- cc [~xyao] > Datanodes should retry forever to connect to SCM in an unsecure environment > --- > > Key: HDDS-2107 > URL: https://issues.apache.org/jira/browse/HDDS-2107 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.1 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > > In an unsecure environment, the datanodes try upto 10 times after waiting for > 1000 milliseconds each time before throwing this error: > {code:java} > Unable to communicate to SCM server at > jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 for past 0 seconds. > java.net.ConnectException: Call From > jmccarthy-ozone-unsecure2-4.vpc.cloudera.com/10.65.36.118 to > jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 failed on connection > exception: java.net.ConnectException: Connection refused; For more details > see: http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy33.getVersion(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794) > at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > ... 13 more > {code} > The datanodes should try forever to connect with SCM and not fail immediately > after 10 retries. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment
Vivek Ratnavel Subramanian created HDDS-2107: Summary: Datanodes should retry forever to connect to SCM in an unsecure environment Key: HDDS-2107 URL: https://issues.apache.org/jira/browse/HDDS-2107 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.4.1 Reporter: Vivek Ratnavel Subramanian Assignee: Vivek Ratnavel Subramanian In an unsecure environment, the datanodes try upto 10 times after waiting for 1000 milliseconds each time before throwing this error: {code:java} Unable to communicate to SCM server at jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 for past 0 seconds. java.net.ConnectException: Call From jmccarthy-ozone-unsecure2-4.vpc.cloudera.com/10.65.36.118 to jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) at org.apache.hadoop.ipc.Client.call(Client.java:1457) at org.apache.hadoop.ipc.Client.call(Client.java:1367) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy33.getVersion(Unknown Source) at org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794) at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572) at org.apache.hadoop.ipc.Client.call(Client.java:1403) ... 13 more {code} The datanodes should try forever to connect with SCM and not fail immediately after 10 retries. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
[ https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2106: - Labels: pull-request-available (was: ) > Avoid usage of hadoop projects as parent of hdds/ozone > -- > > Key: HDDS-2106 > URL: https://issues.apache.org/jira/browse/HDDS-2106 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > > Ozone uses hadoop as a dependency. The dependency defined on multiple level: > 1. the hadoop artifacts are defined in the sections > 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the > parent > As we already have a slightly different assembly process it could be more > resilient to use a dedicated parent project instead of the hadoop one. With > this approach it will be easier to upgrade the versions as we don't need to > be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
[ https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDDS-2106: --- Status: Patch Available (was: Open) > Avoid usage of hadoop projects as parent of hdds/ozone > -- > > Key: HDDS-2106 > URL: https://issues.apache.org/jira/browse/HDDS-2106 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Ozone uses hadoop as a dependency. The dependency defined on multiple level: > 1. the hadoop artifacts are defined in the sections > 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the > parent > As we already have a slightly different assembly process it could be more > resilient to use a dedicated parent project instead of the hadoop one. With > this approach it will be easier to upgrade the versions as we don't need to > be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
[ https://issues.apache.org/jira/browse/HDDS-2106?focusedWorklogId=310124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310124 ] ASF GitHub Bot logged work on HDDS-2106: Author: ASF GitHub Bot Created on: 10/Sep/19 21:15 Start Date: 10/Sep/19 21:15 Worklog Time Spent: 10m Work Description: elek commented on pull request #1423: HDDS-2106. Avoid usage of hadoop projects as parent of hdds/ozone URL: https://github.com/apache/hadoop/pull/1423 Ozone uses hadoop as a dependency. The dependency defined on multiple level: 1. the hadoop artifacts are defined in the sections 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the parent As we already have a slightly different assembly process it could be more resilient to use a dedicated parent project instead of the hadoop one. With this approach it will be easier to upgrade the versions as we don't need to be careful about the pom contents only about the used dependencies. See: https://issues.apache.org/jira/browse/HDDS-2106 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310124) Remaining Estimate: 0h Time Spent: 10m > Avoid usage of hadoop projects as parent of hdds/ozone > -- > > Key: HDDS-2106 > URL: https://issues.apache.org/jira/browse/HDDS-2106 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Ozone uses hadoop as a dependency. The dependency defined on multiple level: > 1. the hadoop artifacts are defined in the sections > 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the > parent > As we already have a slightly different assembly process it could be more > resilient to use a dedicated parent project instead of the hadoop one. With > this approach it will be easier to upgrade the versions as we don't need to > be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14840) Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14840: -- Attachment: HDFS-14840.1.patch > Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager > - > > Key: HDFS-14840 > URL: https://issues.apache.org/jira/browse/HDFS-14840 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14840.1.patch > > > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40 > Instead of synchronizing the entire class, just synchronize the collection. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14840) Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14840: -- Status: Patch Available (was: Open) > Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager > - > > Key: HDFS-14840 > URL: https://issues.apache.org/jira/browse/HDFS-14840 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14840.1.patch > > > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40 > Instead of synchronizing the entire class, just synchronize the collection. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14840) Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager
David Mollitor created HDFS-14840: - Summary: Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager Key: HDFS-14840 URL: https://issues.apache.org/jira/browse/HDFS-14840 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40 Instead of synchronizing the entire class, just synchronize the collection. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids
[ https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310114 ] ASF GitHub Bot logged work on HDDS-2007: Author: ASF GitHub Bot Created on: 10/Sep/19 20:56 Start Date: 10/Sep/19 20:56 Worklog Time Spent: 10m Work Description: smengcl commented on issue #1360: HDDS-2007. Make ozone fs shell command work with OM HA service ids URL: https://github.com/apache/hadoop/pull/1360#issuecomment-530115882 /retest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310114) Time Spent: 5h (was: 4h 50m) > Make ozone fs shell command work with OM HA service ids > --- > > Key: HDDS-2007 > URL: https://issues.apache.org/jira/browse/HDDS-2007 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > Build an HDFS HA-like nameservice for OM HA so that Ozone client can access > Ozone HA cluster with ease. > The majority of the work is already done in HDDS-972. But the problem is that > the client would crash if there are more than one service ids > (ozone.om.service.ids) configured in ozone-site.xml. This need to be address > on client side. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids
[ https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310112 ] ASF GitHub Bot logged work on HDDS-2007: Author: ASF GitHub Bot Created on: 10/Sep/19 20:55 Start Date: 10/Sep/19 20:55 Worklog Time Spent: 10m Work Description: smengcl commented on pull request #1360: HDDS-2007. Make ozone fs shell command work with OM HA service ids URL: https://github.com/apache/hadoop/pull/1360#discussion_r322957816 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java ## @@ -70,26 +71,46 @@ private final UserGroupInformation ugi; private final Text delegationTokenService; + // TODO: Do we want this to be final? + private String omServiceId; + public OMFailoverProxyProvider(OzoneConfiguration configuration, - UserGroupInformation ugi) throws IOException { + UserGroupInformation ugi, String omServiceId) throws IOException { this.conf = configuration; this.omVersion = RPC.getProtocolVersion(OzoneManagerProtocolPB.class); this.ugi = ugi; -loadOMClientConfigs(conf); +this.omServiceId = omServiceId; +loadOMClientConfigs(conf, this.omServiceId); this.delegationTokenService = computeDelegationTokenService(); currentProxyIndex = 0; currentProxyOMNodeId = omNodeIDList.get(currentProxyIndex); } - private void loadOMClientConfigs(Configuration config) throws IOException { + public OMFailoverProxyProvider(OzoneConfiguration configuration, + UserGroupInformation ugi) throws IOException { +this(configuration, ugi, null); + } + + private void loadOMClientConfigs(Configuration config, String omSvcId) + throws IOException { this.omProxies = new HashMap<>(); this.omProxyInfos = new HashMap<>(); this.omNodeIDList = new ArrayList<>(); -Collection omServiceIds = config.getTrimmedStringCollection( -OZONE_OM_SERVICE_IDS_KEY); +Collection omServiceIds; +if (omSvcId == null) { Review comment: Filed https://issues.apache.org/jira/browse/HDDS-2104 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310112) Time Spent: 4h 50m (was: 4h 40m) > Make ozone fs shell command work with OM HA service ids > --- > > Key: HDDS-2007 > URL: https://issues.apache.org/jira/browse/HDDS-2007 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > > Build an HDFS HA-like nameservice for OM HA so that Ozone client can access > Ozone HA cluster with ease. > The majority of the work is already done in HDDS-972. But the problem is that > the client would crash if there are more than one service ids > (ozone.om.service.ids) configured in ozone-site.xml. This need to be address > on client side. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids
[ https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310099 ] ASF GitHub Bot logged work on HDDS-2007: Author: ASF GitHub Bot Created on: 10/Sep/19 20:51 Start Date: 10/Sep/19 20:51 Worklog Time Spent: 10m Work Description: smengcl commented on pull request #1360: HDDS-2007. Make ozone fs shell command work with OM HA service ids URL: https://github.com/apache/hadoop/pull/1360#discussion_r322955933 ## File path: hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java ## @@ -131,6 +142,13 @@ public void initialize(URI name, Configuration conf) throws IOException { // If port number is not specified, read it from config omPort = OmUtils.getOmRpcPort(conf); } +} else if (OmUtils.isServiceIdsDefined(conf)) { Review comment: Make sense. Thanks! Fixing this in an upcoming commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310099) Time Spent: 4h 40m (was: 4.5h) > Make ozone fs shell command work with OM HA service ids > --- > > Key: HDDS-2007 > URL: https://issues.apache.org/jira/browse/HDDS-2007 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > Build an HDFS HA-like nameservice for OM HA so that Ozone client can access > Ozone HA cluster with ease. > The majority of the work is already done in HDDS-972. But the problem is that > the client would crash if there are more than one service ids > (ozone.om.service.ids) configured in ozone-site.xml. This need to be address > on client side. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids
[ https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310098 ] ASF GitHub Bot logged work on HDDS-2007: Author: ASF GitHub Bot Created on: 10/Sep/19 20:50 Start Date: 10/Sep/19 20:50 Worklog Time Spent: 10m Work Description: smengcl commented on pull request #1360: HDDS-2007. Make ozone fs shell command work with OM HA service ids URL: https://github.com/apache/hadoop/pull/1360#discussion_r322955607 ## File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/OzoneClientFactory.java ## @@ -136,6 +136,31 @@ public static OzoneClient getRpcClient(String omHost, Integer omRpcPort, return getRpcClient(config); } + /** + * Returns an OzoneClient which will use RPC protocol. + * + * @param omServiceId + *Service ID of OzoneManager HA cluster. + * + * @param config + *Configuration to be used for OzoneClient creation + * + * @return OzoneClient + * + * @throws IOException + */ + public static OzoneClient getRpcClient(String omServiceId, Review comment: Filed https://issues.apache.org/jira/browse/HDDS-2105 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310098) Time Spent: 4.5h (was: 4h 20m) > Make ozone fs shell command work with OM HA service ids > --- > > Key: HDDS-2007 > URL: https://issues.apache.org/jira/browse/HDDS-2007 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Build an HDFS HA-like nameservice for OM HA so that Ozone client can access > Ozone HA cluster with ease. > The majority of the work is already done in HDDS-972. But the problem is that > the client would crash if there are more than one service ids > (ozone.om.service.ids) configured in ozone-site.xml. This need to be address > on client side. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14528) Failover from Active to Standby Failed
[ https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926881#comment-16926881 ] Ayush Saxena commented on HDFS-14528: - On a cursory view, the patch looks good. minor nit: correct the javadoc {code:java} + /** + * Test that manual failover is successful with Observernode. + */ {code} In the test you are not using observer, but standby... > Failover from Active to Standby Failed > > > Key: HDFS-14528 > URL: https://issues.apache.org/jira/browse/HDFS-14528 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, > HDFS-14528.2.Patch, ZKFC_issue.patch > > > *In a cluster with more than one Standby namenode, manual failover throws > exception for some cases* > *When trying to exectue the failover command from active to standby* > *._/hdfs haadmin -failover nn1 nn2, below Exception is thrown_* > Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on > connection exception: java.net.ConnectException: Connection refused > This is encountered in the following cases : > Scenario 1 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is > thrown > Scenario 2 : > Namenodes - NN1(Active) , NN2(Standby), NN3(Standby) > ZKFC's - ZKFC1, ZKFC2, ZKFC3 > When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is > down, Exception is thrown -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots
[ https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310067 ] ASF GitHub Bot logged work on HDDS-1786: Author: ASF GitHub Bot Created on: 10/Sep/19 19:54 Start Date: 10/Sep/19 19:54 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1163: HDDS-1786 : Datanodes takeSnapshot should delete previously created s… URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530093799 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 38 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 26 | Maven dependency ordering for branch | | +1 | mvninstall | 573 | trunk passed | | +1 | compile | 398 | trunk passed | | +1 | checkstyle | 78 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 874 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 178 | trunk passed | | 0 | spotbugs | 446 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 653 | trunk passed | | -0 | patch | 500 | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | ||| _ Patch Compile Tests _ | | 0 | mvndep | 40 | Maven dependency ordering for patch | | +1 | mvninstall | 562 | the patch passed | | +1 | compile | 400 | the patch passed | | +1 | javac | 400 | the patch passed | | +1 | checkstyle | 86 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 686 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 192 | the patch passed | | +1 | findbugs | 789 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 314 | hadoop-hdds in the patch passed. | | -1 | unit | 3129 | hadoop-ozone in the patch failed. | | +1 | asflicense | 48 | The patch does not generate ASF License warnings. | | | | 9252 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerHandler | | | hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline | | | hadoop.ozone.client.rpc.TestDeleteWithSlowFollower | | | hadoop.ozone.TestMiniChaosOzoneCluster | | | hadoop.ozone.om.TestOzoneManagerHA | | | hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion | | | hadoop.ozone.TestSecureOzoneCluster | | | hadoop.ozone.client.rpc.TestCommitWatcher | | | hadoop.ozone.client.rpc.TestContainerStateMachineFailures | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.ozone.container.TestContainerReplication | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/9/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1163 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f632dd8cccfe 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / dc9abd2 | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/9/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/9/testReport/ | | Max. process+thread count | 4107 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/container-service hadoop-ozone/integration-test U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/9/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this
[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids
[ https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310065 ] ASF GitHub Bot logged work on HDDS-2007: Author: ASF GitHub Bot Created on: 10/Sep/19 19:51 Start Date: 10/Sep/19 19:51 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #1360: HDDS-2007. Make ozone fs shell command work with OM HA service ids URL: https://github.com/apache/hadoop/pull/1360#discussion_r322930998 ## File path: hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java ## @@ -131,6 +142,13 @@ public void initialize(URI name, Configuration conf) throws IOException { // If port number is not specified, read it from config omPort = OmUtils.getOmRpcPort(conf); } +} else if (OmUtils.isServiceIdsDefined(conf)) { Review comment: https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java#L154 This is code line calling creation of BasicOzoneClientAdapterImpl. And below is the code where it checks if conf passed is instance of OzoneConfiguration or not, if not convert to OzoneConfiguration object. https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneClientAdapterImpl.java#L112 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310065) Time Spent: 4h 20m (was: 4h 10m) > Make ozone fs shell command work with OM HA service ids > --- > > Key: HDDS-2007 > URL: https://issues.apache.org/jira/browse/HDDS-2007 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > Build an HDFS HA-like nameservice for OM HA so that Ozone client can access > Ozone HA cluster with ease. > The majority of the work is already done in HDDS-972. But the problem is that > the client would crash if there are more than one service ids > (ozone.om.service.ids) configured in ozone-site.xml. This need to be address > on client side. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots
[ https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310059 ] ASF GitHub Bot logged work on HDDS-1786: Author: ASF GitHub Bot Created on: 10/Sep/19 19:41 Start Date: 10/Sep/19 19:41 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1163: HDDS-1786 : Datanodes takeSnapshot should delete previously created s… URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530089343 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 99 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 76 | Maven dependency ordering for branch | | +1 | mvninstall | 678 | trunk passed | | +1 | compile | 375 | trunk passed | | +1 | checkstyle | 74 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 972 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 184 | trunk passed | | 0 | spotbugs | 440 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 653 | trunk passed | | -0 | patch | 483 | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | ||| _ Patch Compile Tests _ | | 0 | mvndep | 30 | Maven dependency ordering for patch | | +1 | mvninstall | 546 | the patch passed | | +1 | compile | 377 | the patch passed | | +1 | javac | 377 | the patch passed | | +1 | checkstyle | 77 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 728 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 169 | the patch passed | | +1 | findbugs | 657 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 311 | hadoop-hdds in the patch passed. | | -1 | unit | 2292 | hadoop-ozone in the patch failed. | | +1 | asflicense | 43 | The patch does not generate ASF License warnings. | | | | 8517 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures | | | hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.scm.node.TestQueryNode | | | hadoop.ozone.TestSecureOzoneCluster | | | hadoop.ozone.container.TestContainerReplication | | | hadoop.ozone.client.rpc.TestBlockOutputStream | | | hadoop.ozone.client.rpc.TestContainerStateMachine | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.0 Server=19.03.0 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1163 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8d855eed86b8 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / dc9abd2 | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/8/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/8/testReport/ | | Max. process+thread count | 5342 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/container-service hadoop-ozone/integration-test U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/8/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310059) Time Spent: 3.5h (was: 3h 20m) > Datanodes takeSnapshot should delete previously created snapshots > --
[jira] [Updated] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue
[ https://issues.apache.org/jira/browse/HDFS-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14839: -- Attachment: HDFS-14839.1.patch > Use Java Concurrent BlockingQueue instead of Internal BlockQueue > > > Key: HDFS-14839 > URL: https://issues.apache.org/jira/browse/HDFS-14839 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14839.1.patch > > > Replace... > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L86 > With... > https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue
[ https://issues.apache.org/jira/browse/HDFS-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14839: -- Status: Patch Available (was: Open) > Use Java Concurrent BlockingQueue instead of Internal BlockQueue > > > Key: HDFS-14839 > URL: https://issues.apache.org/jira/browse/HDFS-14839 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14839.1.patch > > > Replace... > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L86 > With... > https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue
David Mollitor created HDFS-14839: - Summary: Use Java Concurrent BlockingQueue instead of Internal BlockQueue Key: HDFS-14839 URL: https://issues.apache.org/jira/browse/HDFS-14839 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Replace... https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L86 With... https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states
[ https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310050&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310050 ] ASF GitHub Bot logged work on HDDS-1982: Author: ASF GitHub Bot Created on: 10/Sep/19 19:20 Start Date: 10/Sep/19 19:20 Worklog Time Spent: 10m Work Description: swagle commented on pull request #1344: HDDS-1982 Extend SCMNodeManager to support decommission and maintenance states URL: https://github.com/apache/hadoop/pull/1344#discussion_r322918560 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/states/NodeStateMap.java ## @@ -309,4 +381,61 @@ private void checkIfNodeExist(UUID uuid) throws NodeNotFoundException { throw new NodeNotFoundException("Node UUID: " + uuid); } } + + /** + * Create a list of datanodeInfo for all nodes matching the passed states. + * Passing null for one of the states acts like a wildcard for that state. + * + * @param opState + * @param health + * @return List of DatanodeInfo objects matching the passed state + */ + private List filterNodes( + NodeOperationalState opState, NodeState health) { +if (opState != null && health != null) { Review comment: Can we write Line 395-440 with one simple stream().filter? Nothing wrong with code itself but just a thought. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310050) Time Spent: 4h 50m (was: 4h 40m) > Extend SCMNodeManager to support decommission and maintenance states > > > Key: HDDS-1982 > URL: https://issues.apache.org/jira/browse/HDDS-1982 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > > Currently, within SCM a node can have the following states: > HEALTHY > STALE > DEAD > DECOMMISSIONING > DECOMMISSIONED > The last 2 are not currently used. > In order to support decommissioning and maintenance mode, we need to extend > the set of states a node can have to include decommission and maintenance > states. > It is also important to note that a node decommissioning or entering > maintenance can also be HEALTHY, STALE or go DEAD. > Therefore in this Jira I propose we should model a node state with two > different sets of values. The first, is effectively the liveliness of the > node, with the following states. This is largely what is in place now: > HEALTHY > STALE > DEAD > The second is the node operational state: > IN_SERVICE > DECOMMISSIONING > DECOMMISSIONED > ENTERING_MAINTENANCE > IN_MAINTENANCE > That means the overall total number of states for a node is the cross-product > of the two above lists, however it probably makes sense to keep the two > states seperate internally. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states
[ https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310038 ] ASF GitHub Bot logged work on HDDS-1982: Author: ASF GitHub Bot Created on: 10/Sep/19 19:02 Start Date: 10/Sep/19 19:02 Worklog Time Spent: 10m Work Description: swagle commented on pull request #1344: HDDS-1982 Extend SCMNodeManager to support decommission and maintenance states URL: https://github.com/apache/hadoop/pull/1344#discussion_r322911485 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java ## @@ -417,9 +451,12 @@ private SCMNodeStat getNodeStatInternal(DatanodeDetails datanodeDetails) { @Override public Map getNodeCount() { +// TODO - This does not consider decom, maint etc. Map nodeCountMap = new HashMap(); Review comment: Why not ? It makes it easier to consume for the caller in my opinion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310038) Time Spent: 4h 40m (was: 4.5h) > Extend SCMNodeManager to support decommission and maintenance states > > > Key: HDDS-1982 > URL: https://issues.apache.org/jira/browse/HDDS-1982 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently, within SCM a node can have the following states: > HEALTHY > STALE > DEAD > DECOMMISSIONING > DECOMMISSIONED > The last 2 are not currently used. > In order to support decommissioning and maintenance mode, we need to extend > the set of states a node can have to include decommission and maintenance > states. > It is also important to note that a node decommissioning or entering > maintenance can also be HEALTHY, STALE or go DEAD. > Therefore in this Jira I propose we should model a node state with two > different sets of values. The first, is effectively the liveliness of the > node, with the following states. This is largely what is in place now: > HEALTHY > STALE > DEAD > The second is the node operational state: > IN_SERVICE > DECOMMISSIONING > DECOMMISSIONED > ENTERING_MAINTENANCE > IN_MAINTENANCE > That means the overall total number of states for a node is the cross-product > of the two above lists, however it probably makes sense to keep the two > states seperate internally. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org