[jira] [Resolved] (HDDS-1015) Cleanup snapshot repository settings
[ https://issues.apache.org/jira/browse/HDDS-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-1015. -- Resolution: Won't Fix > Cleanup snapshot repository settings > > > Key: HDDS-1015 > URL: https://issues.apache.org/jira/browse/HDDS-1015 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDDS-1015.00.patch > > > Now we can clean up snapshot repository settings from hadoop-hdds/pom.xml and > hadoop-ozone/pom.xml > As now we have moved our dependencies from Hadoop 3.2.1-SNAPSHOT to 3.2.0 as > part of HDDS-993, we don't require them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2365) TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky
[ https://issues.apache.org/jira/browse/HDDS-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2365 started by Attila Doroszlai. -- > TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky > --- > > Key: HDDS-2365 > URL: https://issues.apache.org/jira/browse/HDDS-2365 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > > TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky, failing in > CI intermittently: > * > https://github.com/elek/ozone-ci-03/blob/master/pr/pr-hdds-2360-9pxww/integration/hadoop-ozone/integration-test/org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider.txt > * > https://github.com/elek/ozone-ci-03/blob/master/pr/pr-hdds-2352-cxhw9/integration/hadoop-ozone/integration-test/org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2365) TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky
Attila Doroszlai created HDDS-2365: -- Summary: TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky Key: HDDS-2365 URL: https://issues.apache.org/jira/browse/HDDS-2365 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Attila Doroszlai Assignee: Attila Doroszlai TestRatisPipelineProvider#testCreatePipelinesDnExclude is flaky, failing in CI intermittently: * https://github.com/elek/ozone-ci-03/blob/master/pr/pr-hdds-2360-9pxww/integration/hadoop-ozone/integration-test/org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider.txt * https://github.com/elek/ozone-ci-03/blob/master/pr/pr-hdds-2352-cxhw9/integration/hadoop-ozone/integration-test/org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter
[ https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959434#comment-16959434 ] Hadoop QA commented on HDFS-14730: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 450 unchanged - 7 fixed = 450 total (was 457) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 43s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}177m 19s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeLifeline | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14730 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983981/HDFS-14730.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 6ad3d5d935c6 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0db0f1e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28174/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28174/testReport/ | | Max. process+thread count | 2836 (vs. ulimit of 5500) |
[jira] [Updated] (HDFS-14933) Fixing a typo in documentaion of Observer NameNode
[ https://issues.apache.org/jira/browse/HDFS-14933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14933: -- Attachment: HDFS-14933.001.patch Status: Patch Available (was: Open) > Fixing a typo in documentaion of Observer NameNode > -- > > Key: HDFS-14933 > URL: https://issues.apache.org/jira/browse/HDFS-14933 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Attachments: HDFS-14933.001.patch > > > Fix a typo in documentation Observer NameNode > https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html > This > {code} > > dfs.ha.tail-edits.period > 10s > > {code} > should be changed to > {code} > > dfs.ha.tail-edits.period.backoff-max > 10s > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14933) Fixing a typo in documentaion of Observer NameNode
[ https://issues.apache.org/jira/browse/HDFS-14933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14933: -- Description: Fix a typo in documentation Observer NameNode https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html This {code} dfs.ha.tail-edits.period 10s {code} should be changed to {code} dfs.ha.tail-edits.period.backoff-max 10s {code} was: Fix a typo in documentation Observer NameNode This {code} dfs.ha.tail-edits.period 10s {code} should be changed to {code} dfs.ha.tail-edits.period.backoff-max 10s {code} > Fixing a typo in documentaion of Observer NameNode > -- > > Key: HDFS-14933 > URL: https://issues.apache.org/jira/browse/HDFS-14933 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > > Fix a typo in documentation Observer NameNode > https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html > This > {code} > > dfs.ha.tail-edits.period > 10s > > {code} > should be changed to > {code} > > dfs.ha.tail-edits.period.backoff-max > 10s > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14933) Fixing a typo in documentaion of Observer NameNode
Xieming Li created HDFS-14933: - Summary: Fixing a typo in documentaion of Observer NameNode Key: HDFS-14933 URL: https://issues.apache.org/jira/browse/HDFS-14933 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Xieming Li Assignee: Xieming Li Fix a typo in documentation Observer NameNode This {code} dfs.ha.tail-edits.period 10s {code} should be changed to {code} dfs.ha.tail-edits.period.backoff-max 10s {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2364) Add a OM metrics to find the false positive rate for the keyMayExist
Mukul Kumar Singh created HDDS-2364: --- Summary: Add a OM metrics to find the false positive rate for the keyMayExist Key: HDDS-2364 URL: https://issues.apache.org/jira/browse/HDDS-2364 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Manager Affects Versions: 0.5.0 Reporter: Mukul Kumar Singh Add a OM metrics to find the false positive rate for the keyMayExist. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2355) Om double buffer flush termination with rocksdb error
[ https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-2355: Priority: Blocker (was: Critical) > Om double buffer flush termination with rocksdb error > - > > Key: HDDS-2355 > URL: https://issues.apache.org/jira/browse/HDDS-2355 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Aravindan Vijayan >Priority: Blocker > > om_1 |java.io.IOException: Unable to write the batch. > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/]) > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/]) > om_1 |at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146) > om_1 |at java.base/java.lang.Thread.run(Thread.java:834) > om_1 |Caused by: org.rocksdb.RocksDBException: > WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in > default WriteCommitted mode). If it is not due to corruption, the WAL must be > emptied before changing the WritePolicy. > om_1 |at org.rocksdb.RocksDB.write0(Native Method) > om_1 |at org.rocksdb.RocksDB.write(RocksDB.java:1421) > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/]) > > In few of my test run's i see this error and OM is terminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-3807) hadoop.http.filter.initializers not applied to webhdfs urls
[ https://issues.apache.org/jira/browse/HDFS-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clay B. resolved HDFS-3807. --- Target Version/s: (was: ) Resolution: Duplicate > hadoop.http.filter.initializers not applied to webhdfs urls > --- > > Key: HDFS-3807 > URL: https://issues.apache.org/jira/browse/HDFS-3807 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 0.23.3 >Reporter: Thomas Graves >Priority: Major > > I was messing with the http filters and noticed that they don't get applied > when going to the webhdfs uri's. This might also apply to the other internal > namenode servlets. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2355) Om double buffer flush termination with rocksdb error
[ https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-2355: Fix Version/s: 0.5.0 > Om double buffer flush termination with rocksdb error > - > > Key: HDDS-2355 > URL: https://issues.apache.org/jira/browse/HDDS-2355 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Aravindan Vijayan >Priority: Blocker > Fix For: 0.5.0 > > > om_1 |java.io.IOException: Unable to write the batch. > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/]) > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/]) > om_1 |at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146) > om_1 |at java.base/java.lang.Thread.run(Thread.java:834) > om_1 |Caused by: org.rocksdb.RocksDBException: > WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in > default WriteCommitted mode). If it is not due to corruption, the WAL must be > emptied before changing the WritePolicy. > om_1 |at org.rocksdb.RocksDB.write0(Native Method) > om_1 |at org.rocksdb.RocksDB.write(RocksDB.java:1421) > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/]) > > In few of my test run's i see this error and OM is terminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959419#comment-16959419 ] Bharat Viswanadham edited comment on HDDS-2356 at 10/25/19 4:25 AM: The above is an issue in OM, which might happen randomly, when there is another handler thread in OM is updating the partInfo Map while flush thread commits those entries. (During commit, we convert OmMultipartInfo to proto, during this we will see the above error). Above config are not related to OM, they are for SCM end. {quote}However, writing fails due to no more blocks allocated. I guess my cluster cannot keep up with the writing. {quote} we can see the error in SCM logs why no more blocks are being allocated. And also this exception will be received by OM too. was (Author: bharatviswa): The above is an issue in OM, which might happen randomly, when there is another handler thread in OM is updating the partInfo Map while flush thread commits those entries. (During commit, we convert OmMultipartInfo to proto, during this we will see the above error). Above config are not related to OM, they are for SCM end. {quote}However, writing fails due to no more blocks allocated. I guess my cluster cannot keep up with the writing. {quote} we can see the error in SCM logs why no more blocks are being allocated. And also this exception will be received by OM too. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959419#comment-16959419 ] Bharat Viswanadham commented on HDDS-2356: -- The above is an issue in OM, which might happen randomly, when there is another handler thread in OM is updating the partInfo Map while flush thread commits those entries. (During commit, we convert OmMultipartInfo to proto, during this we will see the above error). Above config are not related to OM, they are for SCM end. {quote}However, writing fails due to no more blocks allocated. I guess my cluster cannot keep up with the writing. {quote} we can see the error in SCM logs why no more blocks are being allocated. And also this exception will be received by OM too. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959407#comment-16959407 ] Li Cheng commented on HDDS-2356: Quick update. I tried to make ozone have more handlers (like 10+ times more) and cease to see this error. See the attached properties. However, writing fails due to no more blocks allocated. I guess my cluster cannot keep up with the writing. ozone.scm.handler.count.key 128 OZONE, MANAGEMENT, PERFORMANCE The number of RPC handler threads for each SCM service endpoint. The default is appropriate for small clusters (tens of nodes). Set a value that is appropriate for the cluster size. Generally, HDFS recommends RPC handler count is set to 20 * log2(Cluster Size) with an upper limit of 200. However, SCM will not have the same amount of traffic as Namenode, so a value much smaller than that will work well too. ozone.om.handler.count.key 256 OM, PERFORMANCE The number of RPC handler threads for OM service endpoints. dfs.container.ratis.num.container.op.executors 128 OZONE, RATIS, PERFORMANCE Number of executors that will be used by Ratis to execute container ops.(10 by default). dfs.container.ratis.num.write.chunk.threads 512 OZONE, RATIS, PERFORMANCE Maximum number of threads in the thread pool that Ratis will use for writing chunks (60 by default). > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2344) Add immutable entries in to the DoubleBuffer for Volume requests.
[ https://issues.apache.org/jira/browse/HDDS-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia updated HDDS-2344: Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for review [~xyao] and [~bharat] for contribution. This has been committed to master. > Add immutable entries in to the DoubleBuffer for Volume requests. > - > > Key: HDDS-2344 > URL: https://issues.apache.org/jira/browse/HDDS-2344 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > OMVolumeCreateRequest.java L159: > {code:java} > omClientResponse = > new OMVolumeCreateResponse(omVolumeArgs,volumeList, > omResponse.build());{code} > > We add this to double-buffer, and double-buffer flushThread which is running > in the background when picks up, converts to protoBuf and to ByteArray and > write to rocksDB tables. So, during this conversion(This conversion will be > done without any lock acquire), if any other request changes internal > structure(like acls list) of OmVolumeArgs we might get > ConcurrentModificationException. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2363) Improve datanode write failure log
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Description: Logs as following didn't reveal the true failure of write failure. 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: CreateContainer : Trace ID: : Message: Container creation failed. : Result: CONTAINER_INTERNAL_ERROR 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk : Trace ID: : Message: ContainerID 402 creation failed : Result: CONTAINER_INTERNAL_ERROR was: Logs as following haven't reveal the true failure of write failure. 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: CreateContainer : Trace ID: : Message: Container creation failed. : Result: CONTAINER_INTERNAL_ERROR 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk : Trace ID: : Message: ContainerID 402 creation failed : Result: CONTAINER_INTERNAL_ERROR > Improve datanode write failure log > -- > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > > Logs as following didn't reveal the true failure of write failure. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2363) Improve datanode write failure log
Sammi Chen created HDDS-2363: Summary: Improve datanode write failure log Key: HDDS-2363 URL: https://issues.apache.org/jira/browse/HDDS-2363 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Reporter: Sammi Chen Assignee: Sammi Chen Logs as following haven't reveal the true failure of write failure. 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: CreateContainer : Trace ID: : Message: Container creation failed. : Result: CONTAINER_INTERNAL_ERROR 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk : Trace ID: : Message: ContainerID 402 creation failed : Result: CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.
[ https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959390#comment-16959390 ] Hadoop QA commented on HDFS-14908: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 9s{color} | {color:red} HDFS-14908 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14908 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28175/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > LeaseManager should check parent-child relationship when filter open files. > --- > > Key: HDFS-14908 > URL: https://issues.apache.org/jira/browse/HDFS-14908 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.1 >Reporter: Jinglun >Assignee: Jinglun >Priority: Minor > Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, > HDFS-14908.003.patch, Test.java, TestV2.java, TestV3.java > > > Now when doing listOpenFiles(), LeaseManager only checks whether the filter > path is the prefix of the open files. We should check whether the filter path > is the parent/ancestor of the open files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.
[ https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959388#comment-16959388 ] Jinglun edited comment on HDFS-14908 at 10/25/19 3:29 AM: -- Hi [~hexiaoqiao], thanks your nice comments ! And sorry for my late response. Your demo is much simpler and it needs a minor change. When super user calls listOpenFiles, the parameter path might end with a slash, like '/user/hdfs_admin/demo/'. So before calling the code below we must normalize the path first. The normalize part would introduce some overhead. {code:java} (fullPathName.startsWith(path) && (fullPathName.equals(path) || fullPathName.charAt(path.length() - 1) == Path.SEPARATOR_CHAR)){code} When doing the random tests, I added a new method startsWithAndCharAt() with the normalize part to monitor the demo. {code:java} public static boolean startsWithAndCharAt(String path, String parent) { if (path.length() > 1 && path.charAt(path.length() - 1) == '/') { path = path.substring(0, path.length() - 1); } return path.startsWith(parent) && (path.equals(parent) || path.charAt(parent.length() - 1) == '/'); } {code} And here is the result: *Case 1:* path starts with parent and neither path nor parent end with '/' ||Time||100,000,000|| |isParent|7,888ms| |startsWithAndCharAt|8,850ms| |startsWith|7,877ms| *Case 2:* path doesn't start with parent and neither path nor parent end with '/' ||Time||10,000,000,000|| |isParent|2,391ms| |startsWithAndCharAt|2,362ms| |startsWith|2,384ms| *Case 4:* path starts with parent and both path and parent end with '/' ||Time||100,000,000|| |isParent|7,882ms| |startsWithAndCharAt|11,118ms| |startsWith|7,803ms| Test commands are: {quote}java -Xmx512m Test 1 case1 java -Xmx512m Test 100 case2 java -Xmx512m Test 1 case4 {quote} Test file is TestV3.java was (Author: lijinglun): Hi [~hexiaoqiao], thanks your nice comments ! And sorry for my late response. Your demo is much simpler and it needs a minor change. When super user calls listOpenFiles, the parameter path might end with a slash, like '/user/hdfs_admin/demo/'. So before calling the code below we must normalize the path first. The normalize part would introduce some overhead. (fullPathName.startsWith(path) && (fullPathName.equals(path) || fullPathName.charAt(path.length() - 1) == Path.SEPARATOR_CHAR)) When doing the random tests, I added a new method startsWithAndCharAt() with the normalize part to monitor the demo. {code:java} public static boolean startsWithAndCharAt(String path, String parent) { if (path.length() > 1 && path.charAt(path.length() - 1) == '/') { path = path.substring(0, path.length() - 1); } return path.startsWith(parent) && (path.equals(parent) || path.charAt(parent.length() - 1) == '/'); } {code} And here is the result: *Case 1:* path starts with parent and neither path nor parent end with '/' ||Time||100,000,000|| |isParent|7,888ms| |startsWithAndCharAt|8,850ms| |startsWith|7,877ms| *Case 2:* path doesn't start with parent and neither path nor parent end with '/' ||Time||10,000,000,000|| |isParent|2,391ms| |startsWithAndCharAt|2,362ms| |startsWith|2,384ms| *Case 4:* path starts with parent and both path and parent end with '/' ||Time||100,000,000|| |isParent|7,882ms| |startsWithAndCharAt|11,118ms| |startsWith|7,803ms| Test commands are: {quote}java -Xmx512m Test 1 case1 java -Xmx512m Test 100 case2 java -Xmx512m Test 1 case4 {quote} Test file is TestV3.java > LeaseManager should check parent-child relationship when filter open files. > --- > > Key: HDFS-14908 > URL: https://issues.apache.org/jira/browse/HDFS-14908 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.1 >Reporter: Jinglun >Assignee: Jinglun >Priority: Minor > Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, > HDFS-14908.003.patch, Test.java, TestV2.java, TestV3.java > > > Now when doing listOpenFiles(), LeaseManager only checks whether the filter > path is the prefix of the open files. We should check whether the filter path > is the parent/ancestor of the open files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.
[ https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun updated HDFS-14908: --- Attachment: TestV3.java > LeaseManager should check parent-child relationship when filter open files. > --- > > Key: HDFS-14908 > URL: https://issues.apache.org/jira/browse/HDFS-14908 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.1 >Reporter: Jinglun >Assignee: Jinglun >Priority: Minor > Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, > HDFS-14908.003.patch, Test.java, TestV2.java, TestV3.java > > > Now when doing listOpenFiles(), LeaseManager only checks whether the filter > path is the prefix of the open files. We should check whether the filter path > is the parent/ancestor of the open files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2344) Add immutable entries in to the DoubleBuffer for Volume requests.
[ https://issues.apache.org/jira/browse/HDDS-2344?focusedWorklogId=333851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333851 ] ASF GitHub Bot logged work on HDDS-2344: Author: ASF GitHub Bot Created on: 25/Oct/19 03:28 Start Date: 25/Oct/19 03:28 Worklog Time Spent: 10m Work Description: dineshchitlangia commented on pull request #71: HDDS-2344. Add immutable entries in to the DoubleBuffer for Volume requests. URL: https://github.com/apache/hadoop-ozone/pull/71 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333851) Time Spent: 40m (was: 0.5h) > Add immutable entries in to the DoubleBuffer for Volume requests. > - > > Key: HDDS-2344 > URL: https://issues.apache.org/jira/browse/HDDS-2344 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > OMVolumeCreateRequest.java L159: > {code:java} > omClientResponse = > new OMVolumeCreateResponse(omVolumeArgs,volumeList, > omResponse.build());{code} > > We add this to double-buffer, and double-buffer flushThread which is running > in the background when picks up, converts to protoBuf and to ByteArray and > write to rocksDB tables. So, during this conversion(This conversion will be > done without any lock acquire), if any other request changes internal > structure(like acls list) of OmVolumeArgs we might get > ConcurrentModificationException. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.
[ https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959388#comment-16959388 ] Jinglun commented on HDFS-14908: Hi [~hexiaoqiao], thanks your nice comments ! And sorry for my late response. Your demo is much simpler and it needs a minor change. When super user calls listOpenFiles, the parameter path might end with a slash, like '/user/hdfs_admin/demo/'. So before calling the code below we must normalize the path first. The normalize part would introduce some overhead. (fullPathName.startsWith(path) && (fullPathName.equals(path) || fullPathName.charAt(path.length() - 1) == Path.SEPARATOR_CHAR)) When doing the random tests, I added a new method startsWithAndCharAt() with the normalize part to monitor the demo. {code:java} public static boolean startsWithAndCharAt(String path, String parent) { if (path.length() > 1 && path.charAt(path.length() - 1) == '/') { path = path.substring(0, path.length() - 1); } return path.startsWith(parent) && (path.equals(parent) || path.charAt(parent.length() - 1) == '/'); } {code} And here is the result: *Case 1:* path starts with parent and neither path nor parent end with '/' ||Time||100,000,000|| |isParent|7,888ms| |startsWithAndCharAt|8,850ms| |startsWith|7,877ms| *Case 2:* path doesn't start with parent and neither path nor parent end with '/' ||Time||10,000,000,000|| |isParent|2,391ms| |startsWithAndCharAt|2,362ms| |startsWith|2,384ms| *Case 4:* path starts with parent and both path and parent end with '/' ||Time||100,000,000|| |isParent|7,882ms| |startsWithAndCharAt|11,118ms| |startsWith|7,803ms| Test commands are: {quote}java -Xmx512m Test 1 case1 java -Xmx512m Test 100 case2 java -Xmx512m Test 1 case4 {quote} Test file is TestV3.java > LeaseManager should check parent-child relationship when filter open files. > --- > > Key: HDFS-14908 > URL: https://issues.apache.org/jira/browse/HDFS-14908 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.1 >Reporter: Jinglun >Assignee: Jinglun >Priority: Minor > Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, > HDFS-14908.003.patch, Test.java, TestV2.java > > > Now when doing listOpenFiles(), LeaseManager only checks whether the filter > path is the prefix of the open files. We should check whether the filter path > is the parent/ancestor of the open files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2360) Update Ratis snapshot to d6d58d0
[ https://issues.apache.org/jira/browse/HDDS-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2360. -- Fix Version/s: 0.5.0 Resolution: Fixed > Update Ratis snapshot to d6d58d0 > > > Key: HDDS-2360 > URL: https://issues.apache.org/jira/browse/HDDS-2360 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Client, Ozone Datanode >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Update Ratis dependency version to snapshot > [d6d58d0|https://github.com/apache/incubator-ratis/commit/d6d58d0], to fix > memory issues (RATIS-726, RATIS-728). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2360) Update Ratis snapshot to d6d58d0
[ https://issues.apache.org/jira/browse/HDDS-2360?focusedWorklogId=333846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333846 ] ASF GitHub Bot logged work on HDDS-2360: Author: ASF GitHub Bot Created on: 25/Oct/19 03:13 Start Date: 25/Oct/19 03:13 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #83: HDDS-2360. Update Ratis snapshot to d6d58d0 URL: https://github.com/apache/hadoop-ozone/pull/83 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333846) Time Spent: 20m (was: 10m) > Update Ratis snapshot to d6d58d0 > > > Key: HDDS-2360 > URL: https://issues.apache.org/jira/browse/HDDS-2360 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Client, Ozone Datanode >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Update Ratis dependency version to snapshot > [d6d58d0|https://github.com/apache/incubator-ratis/commit/d6d58d0], to fix > memory issues (RATIS-726, RATIS-728). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14917) Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"
[ https://issues.apache.org/jira/browse/HDFS-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959385#comment-16959385 ] Xieming Li commented on HDFS-14917: --- [~elgoiri][~tasanuma] Thank you for your review. > Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html" > --- > > Key: HDFS-14917 > URL: https://issues.apache.org/jira/browse/HDFS-14917 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Fix For: 3.3.0 > > Attachments: HDFS-14917.patch, image-2019-10-21-17-49-10-635.png, > image-2019-10-21-17-49-58-759.png, image-2019-10-21-18-03-53-914.png, > image-2019-10-21-18-04-52-405.png, image-2019-10-21-18-05-19-160.png, > image-2019-10-21-18-13-01-884.png, image-2019-10-21-18-13-54-427.png > > > This is a really simple UI change proposal: > The icon of "Decommissioned & dead" datanode could be improved. It can be > changed from !image-2019-10-21-18-05-19-160.png|width=31,height=28! to > !image-2019-10-21-18-04-52-405.png|width=32,height=29! so that, > # icon " !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be > used for all status starts with "decommission" on dfshealth.html, > # icon " !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be > differentiated with icon " !image-2019-10-21-18-13-54-427.png! " on > federationhealth.html > |*DataNode Infomation Legend (now)* > dfshealth.html#tab-datanode > |!image-2019-10-21-17-49-10-635.png|width=516,height=55!| > |*DataNode* *Infomation* *Legend (proposed)* > dfshealth.html#tab-datanode > |!image-2019-10-21-18-03-53-914.png|width=589,height=60!| > |*NameService Legend* > > federationhealth.htm#tab-namenode|!image-2019-10-21-17-49-58-759.png|width=445,height=43!| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2361) Ozone Manager init & start command prints out unnecessary line in the beginning.
[ https://issues.apache.org/jira/browse/HDDS-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YiSheng Lien reassigned HDDS-2361: -- Assignee: YiSheng Lien > Ozone Manager init & start command prints out unnecessary line in the > beginning. > > > Key: HDDS-2361 > URL: https://issues.apache.org/jira/browse/HDDS-2361 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Aravindan Vijayan >Assignee: YiSheng Lien >Priority: Major > > {code} > [root@avijayan-om-1 ozone-0.5.0-SNAPSHOT]# bin/ozone --daemon start om > Ozone Manager classpath extended by > {code} > We could probably print this line only when extra elements are added to OM > classpathor skip printing this line altogether. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter
[ https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Zhang updated HDFS-14730: -- Attachment: (was: HDFS-14730.002.patch) > Remove unused configuration dfs.web.authentication.filter > -- > > Key: HDFS-14730 > URL: https://issues.apache.org/jira/browse/HDFS-14730 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14730.001.patch, HDFS-14730.002.patch > > > After HADOOP-16314, this configuration is not used any where, so I propose to > deprecate it to avoid misuse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter
[ https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Zhang updated HDFS-14730: -- Attachment: HDFS-14730.002.patch > Remove unused configuration dfs.web.authentication.filter > -- > > Key: HDFS-14730 > URL: https://issues.apache.org/jira/browse/HDFS-14730 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14730.001.patch, HDFS-14730.002.patch > > > After HADOOP-16314, this configuration is not used any where, so I propose to > deprecate it to avoid misuse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959358#comment-16959358 ] Hadoop QA commented on HDFS-14927: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 28m 54s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 44s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 87m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14927 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983977/HDFS-14927.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e53f74ffb535 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b41394e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28173/testReport/ | | Max. process+thread count | 3599 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28173/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RBF: Add metrics for async callers thread
[jira] [Commented] (HDFS-14917) Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"
[ https://issues.apache.org/jira/browse/HDFS-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959356#comment-16959356 ] Hudson commented on HDFS-14917: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17572 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17572/]) HDFS-14917. Change the ICON of "Decommissioned & dead" datanode on (tasanuma: rev 0db0f1e3990c4bf93ca8db41858860da6537a9bf) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/hadoop.css > Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html" > --- > > Key: HDFS-14917 > URL: https://issues.apache.org/jira/browse/HDFS-14917 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Fix For: 3.3.0 > > Attachments: HDFS-14917.patch, image-2019-10-21-17-49-10-635.png, > image-2019-10-21-17-49-58-759.png, image-2019-10-21-18-03-53-914.png, > image-2019-10-21-18-04-52-405.png, image-2019-10-21-18-05-19-160.png, > image-2019-10-21-18-13-01-884.png, image-2019-10-21-18-13-54-427.png > > > This is a really simple UI change proposal: > The icon of "Decommissioned & dead" datanode could be improved. It can be > changed from !image-2019-10-21-18-05-19-160.png|width=31,height=28! to > !image-2019-10-21-18-04-52-405.png|width=32,height=29! so that, > # icon " !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be > used for all status starts with "decommission" on dfshealth.html, > # icon " !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be > differentiated with icon " !image-2019-10-21-18-13-54-427.png! " on > federationhealth.html > |*DataNode Infomation Legend (now)* > dfshealth.html#tab-datanode > |!image-2019-10-21-17-49-10-635.png|width=516,height=55!| > |*DataNode* *Infomation* *Legend (proposed)* > dfshealth.html#tab-datanode > |!image-2019-10-21-18-03-53-914.png|width=589,height=60!| > |*NameService Legend* > > federationhealth.htm#tab-namenode|!image-2019-10-21-17-49-58-759.png|width=445,height=43!| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14902) RBF: NullPointer When Misconfigured
[ https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-14902: Parent: HDFS-14603 Issue Type: Sub-task (was: Improvement) > RBF: NullPointer When Misconfigured > --- > > Key: HDFS-14902 > URL: https://issues.apache.org/jira/browse/HDFS-14902 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: Takanobu Asanuma >Priority: Minor > Attachments: HDFS-14902.001.patch > > > Admittedly the server was mis-configured, but this should be a bit more > elegant. > {code:none} > 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled > exception updating NN registration for null:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2344) Add immutable entries in to the DoubleBuffer for Volume requests.
[ https://issues.apache.org/jira/browse/HDDS-2344?focusedWorklogId=333819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333819 ] ASF GitHub Bot logged work on HDDS-2344: Author: ASF GitHub Bot Created on: 25/Oct/19 01:32 Start Date: 25/Oct/19 01:32 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #71: HDDS-2344. Add immutable entries in to the DoubleBuffer for Volume requests. URL: https://github.com/apache/hadoop-ozone/pull/71 …quests. ## What changes were proposed in this pull request? OMVolumeCreateRequest.java L159: omClientResponse = new OMVolumeCreateResponse(omVolumeArgs,volumeList, omResponse.build()); We add this to double-buffer, and double-buffer flushThread which is running in the background when picks up, converts to protoBuf and to ByteArray and write to rocksDB tables. So, during this conversion(This conversion will be done without any lock acquire), if any other request changes internal structure(like acls list) of OmVolumeArgs we might get ConcurrentModificationException. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2344 ## How was this patch tested? Ran TestOzoneRpcClient which tests this code path and also added a new test for clone method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333819) Time Spent: 0.5h (was: 20m) > Add immutable entries in to the DoubleBuffer for Volume requests. > - > > Key: HDDS-2344 > URL: https://issues.apache.org/jira/browse/HDDS-2344 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > OMVolumeCreateRequest.java L159: > {code:java} > omClientResponse = > new OMVolumeCreateResponse(omVolumeArgs,volumeList, > omResponse.build());{code} > > We add this to double-buffer, and double-buffer flushThread which is running > in the background when picks up, converts to protoBuf and to ByteArray and > write to rocksDB tables. So, during this conversion(This conversion will be > done without any lock acquire), if any other request changes internal > structure(like acls list) of OmVolumeArgs we might get > ConcurrentModificationException. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2344) Add immutable entries in to the DoubleBuffer for Volume requests.
[ https://issues.apache.org/jira/browse/HDDS-2344?focusedWorklogId=333818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333818 ] ASF GitHub Bot logged work on HDDS-2344: Author: ASF GitHub Bot Created on: 25/Oct/19 01:31 Start Date: 25/Oct/19 01:31 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #71: HDDS-2344. Add immutable entries in to the DoubleBuffer for Volume requests. URL: https://github.com/apache/hadoop-ozone/pull/71 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333818) Time Spent: 20m (was: 10m) > Add immutable entries in to the DoubleBuffer for Volume requests. > - > > Key: HDDS-2344 > URL: https://issues.apache.org/jira/browse/HDDS-2344 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > OMVolumeCreateRequest.java L159: > {code:java} > omClientResponse = > new OMVolumeCreateResponse(omVolumeArgs,volumeList, > omResponse.build());{code} > > We add this to double-buffer, and double-buffer flushThread which is running > in the background when picks up, converts to protoBuf and to ByteArray and > write to rocksDB tables. So, during this conversion(This conversion will be > done without any lock acquire), if any other request changes internal > structure(like acls list) of OmVolumeArgs we might get > ConcurrentModificationException. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14917) Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"
[ https://issues.apache.org/jira/browse/HDFS-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-14917: Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for your contribution, [~risyomei], and thanks for your review, [~elgoiri]. > Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html" > --- > > Key: HDFS-14917 > URL: https://issues.apache.org/jira/browse/HDFS-14917 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Fix For: 3.3.0 > > Attachments: HDFS-14917.patch, image-2019-10-21-17-49-10-635.png, > image-2019-10-21-17-49-58-759.png, image-2019-10-21-18-03-53-914.png, > image-2019-10-21-18-04-52-405.png, image-2019-10-21-18-05-19-160.png, > image-2019-10-21-18-13-01-884.png, image-2019-10-21-18-13-54-427.png > > > This is a really simple UI change proposal: > The icon of "Decommissioned & dead" datanode could be improved. It can be > changed from !image-2019-10-21-18-05-19-160.png|width=31,height=28! to > !image-2019-10-21-18-04-52-405.png|width=32,height=29! so that, > # icon " !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be > used for all status starts with "decommission" on dfshealth.html, > # icon " !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be > differentiated with icon " !image-2019-10-21-18-13-54-427.png! " on > federationhealth.html > |*DataNode Infomation Legend (now)* > dfshealth.html#tab-datanode > |!image-2019-10-21-17-49-10-635.png|width=516,height=55!| > |*DataNode* *Infomation* *Legend (proposed)* > dfshealth.html#tab-datanode > |!image-2019-10-21-18-03-53-914.png|width=589,height=60!| > |*NameService Legend* > > federationhealth.htm#tab-namenode|!image-2019-10-21-17-49-58-759.png|width=445,height=43!| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959348#comment-16959348 ] Íñigo Goiri commented on HDFS-14927: The name {{getAsyncCallerServiceThreadPoolJson()}} seems a little specific. Isn't there anything a little more general and descriptive? In any case, please go ahead with the unit test. > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Attachments: HDFS-14927.001.patch > > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14638) [Dynamometer] Fix scripts to refer to current build structure
[ https://issues.apache.org/jira/browse/HDFS-14638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-14638: Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) > [Dynamometer] Fix scripts to refer to current build structure > - > > Key: HDFS-14638 > URL: https://issues.apache.org/jira/browse/HDFS-14638 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode, test >Reporter: Erik Krogen >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.3.0 > > > The scripts within the Dynamometer build dirs all refer to the old > distribution structure with a single {{bin}} directory and a single {{lib}} > directory. We need to update them to refer to the Hadoop-standard layout. > Also as pointed out by [~pingsutw]: > {quote} > Due to dynamometer rename to hadoop-dynamometer in hadoop-tools > but we still use old name of jar inside the scripts > {code} > "$hadoop_cmd" jar "${script_pwd}"/lib/dynamometer-infra-*.jar > org.apache.hadoop.tools.dynamometer.Client "$@" > {code} > We should rename these jar inside the scripts > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959331#comment-16959331 ] Leon Gao commented on HDFS-14927: - [~elgoiri] Please let me know the change makes sense to you, then I will add some UT ^ > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Attachments: HDFS-14927.001.patch > > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959320#comment-16959320 ] Leon Gao commented on HDFS-14927: - Submitting patch and update the ticket name.. As executorService is just handling async fan-out calls, unlike RPC client connection pool. > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Attachments: HDFS-14927.001.patch > > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959321#comment-16959321 ] Leon Gao commented on HDFS-14927: - Example metrics: "AsyncCallerServiceThreadPool" : "\{\"active\":0,\"total\":218,\"max\":19191}" > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Attachments: HDFS-14927.001.patch > > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leon Gao updated HDFS-14927: Attachment: HDFS-14927.001.patch Status: Patch Available (was: Reopened) > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Attachments: HDFS-14927.001.patch > > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14923) Remove dead code from HealthMonitor
[ https://issues.apache.org/jira/browse/HDFS-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959319#comment-16959319 ] Wei-Chiu Chuang commented on HDFS-14923: +1 > Remove dead code from HealthMonitor > --- > > Key: HDFS-14923 > URL: https://issues.apache.org/jira/browse/HDFS-14923 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.0, 3.2.1, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Minor > Attachments: HDFS-14923.001.patch > > > Dig ZKFC source code and find that the dead code as follow > {code} > public void removeCallback(Callback cb) { >callbacks.remove(cb); > } > public synchronized void removeServiceStateCallback(ServiceStateCallback cb) { >serviceStateCallbacks.remove(cb); > } > synchronized HAServiceStatus getLastServiceStatus() { >return lastServiceState; > } > {code} > It's useless, and should be deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leon Gao updated HDFS-14927: Summary: RBF: Add metrics for async callers thread pool (was: RBF: Add metrics for active RPC client threads for async calls) > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > > It is good to add some monitoring on the active RPC client threads to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leon Gao updated HDFS-14927: Description: It is good to add some monitoring on the async caller thread pool to handle fan-out RPC client requests, so we know the utilization and when to bump up dfs.federation.router.client.thread-size (was: It is good to add some monitoring on the active RPC client threads to handle fan-out RPC client requests, so we know the utilization and when to bump up dfs.federation.router.client.thread-size) > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14638) [Dynamometer] Fix scripts to refer to current build structure
[ https://issues.apache.org/jira/browse/HDFS-14638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959303#comment-16959303 ] Hudson commented on HDFS-14638: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17571 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17571/]) HDFS-14638. [Dynamometer] Fix scripts to refer to current build (weichiu: rev b41394eec8552f419aefe452b3fdb8ff2506b9d1) * (edit) hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-blockgen/src/main/bash/generate-block-lists.sh * (edit) hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-workload/src/main/bash/start-workload.sh * (edit) hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-infra/src/main/bash/start-dynamometer-cluster.sh > [Dynamometer] Fix scripts to refer to current build structure > - > > Key: HDFS-14638 > URL: https://issues.apache.org/jira/browse/HDFS-14638 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode, test >Reporter: Erik Krogen >Assignee: Takanobu Asanuma >Priority: Major > > The scripts within the Dynamometer build dirs all refer to the old > distribution structure with a single {{bin}} directory and a single {{lib}} > directory. We need to update them to refer to the Hadoop-standard layout. > Also as pointed out by [~pingsutw]: > {quote} > Due to dynamometer rename to hadoop-dynamometer in hadoop-tools > but we still use old name of jar inside the scripts > {code} > "$hadoop_cmd" jar "${script_pwd}"/lib/dynamometer-infra-*.jar > org.apache.hadoop.tools.dynamometer.Client "$@" > {code} > We should rename these jar inside the scripts > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2355) Om double buffer flush termination with rocksdb error
[ https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan reassigned HDDS-2355: --- Assignee: Aravindan Vijayan > Om double buffer flush termination with rocksdb error > - > > Key: HDDS-2355 > URL: https://issues.apache.org/jira/browse/HDDS-2355 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Aravindan Vijayan >Priority: Critical > > om_1 |java.io.IOException: Unable to write the batch. > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/]) > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/]) > om_1 |at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146) > om_1 |at java.base/java.lang.Thread.run(Thread.java:834) > om_1 |Caused by: org.rocksdb.RocksDBException: > WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in > default WriteCommitted mode). If it is not due to corruption, the WAL must be > emptied before changing the WritePolicy. > om_1 |at org.rocksdb.RocksDB.write0(Native Method) > om_1 |at org.rocksdb.RocksDB.write(RocksDB.java:1421) > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/]) > > In few of my test run's i see this error and OM is terminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2355) Om double buffer flush termination with rocksdb error
[ https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan updated HDDS-2355: Priority: Critical (was: Major) > Om double buffer flush termination with rocksdb error > - > > Key: HDDS-2355 > URL: https://issues.apache.org/jira/browse/HDDS-2355 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Priority: Critical > > om_1 |java.io.IOException: Unable to write the batch. > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/]) > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/]) > om_1 |at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146) > om_1 |at java.base/java.lang.Thread.run(Thread.java:834) > om_1 |Caused by: org.rocksdb.RocksDBException: > WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in > default WriteCommitted mode). If it is not due to corruption, the WAL must be > emptied before changing the WritePolicy. > om_1 |at org.rocksdb.RocksDB.write0(Native Method) > om_1 |at org.rocksdb.RocksDB.write(RocksDB.java:1421) > om_1 | at > [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/]) > > In few of my test run's i see this error and OM is terminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14931) hdfs crypto commands limit column width
[ https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959277#comment-16959277 ] Hadoop QA commented on HDFS-14931: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 46s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 1 unchanged - 1 fixed = 2 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m 3s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}159m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDistributedFileSystem | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14931 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983959/HDFS-14931.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c9d33fd85c1e 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a1b4eeb | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/28172/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28172/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Assigned] (HDDS-2273) Avoid buffer copying in GrpcReplicationService
[ https://issues.apache.org/jira/browse/HDDS-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal reassigned HDDS-2273: --- Assignee: Attila Doroszlai (was: Tsz-wo Sze) > Avoid buffer copying in GrpcReplicationService > -- > > Key: HDDS-2273 > URL: https://issues.apache.org/jira/browse/HDDS-2273 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz-wo Sze >Assignee: Attila Doroszlai >Priority: Major > > In GrpcOutputStream, it writes data to a ByteArrayOutputStream and copies > them to a ByteString. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2272) Avoid buffer copying in GrpcReplicationClient
[ https://issues.apache.org/jira/browse/HDDS-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal reassigned HDDS-2272: --- Assignee: Attila Doroszlai (was: Tsz-wo Sze) > Avoid buffer copying in GrpcReplicationClient > - > > Key: HDDS-2272 > URL: https://issues.apache.org/jira/browse/HDDS-2272 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz-wo Sze >Assignee: Attila Doroszlai >Priority: Major > > In StreamDownloader.onNext, CopyContainerResponseProto is copied to a byte[] > and then it is written out to the stream. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2071) Support filters in ozone insight point
[ https://issues.apache.org/jira/browse/HDDS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDDS-2071: - Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~elek] for the contribution. I've merged the fix to master. > Support filters in ozone insight point > -- > > Key: HDDS-2071 > URL: https://issues.apache.org/jira/browse/HDDS-2071 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > With Ozone insight we can print out all the logs / metrics of one specific > component s (eg. scm.node-manager or scm.node-manager). > It would be great to support additional filtering capabilities where the > output is filtered based on specific keys. > For example to print out all of the logs related to one datanode or related > to one type of RPC request. > Filter should be a key value map (eg. --filter > datanode=sjdhfhf,rpc=createChunk) which can be defined in the ozone insight > CLI. > As we have no option to add additional tags to the logs (it may be supported > by log4j2 but not with slf4k), the first implementation can be implemented by > pattern matching. > For example in SCMNodeManager.processNodeReport contains trace/debug logs > which includes the " [datanode={}]" part. This formatting convention can be > used to print out the only the related information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2071) Support filters in ozone insight point
[ https://issues.apache.org/jira/browse/HDDS-2071?focusedWorklogId=333731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333731 ] ASF GitHub Bot logged work on HDDS-2071: Author: ASF GitHub Bot Created on: 24/Oct/19 21:55 Start Date: 24/Oct/19 21:55 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #4: HDDS-2071. Support filters in ozone insight point URL: https://github.com/apache/hadoop-ozone/pull/4 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333731) Time Spent: 1h 50m (was: 1h 40m) > Support filters in ozone insight point > -- > > Key: HDDS-2071 > URL: https://issues.apache.org/jira/browse/HDDS-2071 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > With Ozone insight we can print out all the logs / metrics of one specific > component s (eg. scm.node-manager or scm.node-manager). > It would be great to support additional filtering capabilities where the > output is filtered based on specific keys. > For example to print out all of the logs related to one datanode or related > to one type of RPC request. > Filter should be a key value map (eg. --filter > datanode=sjdhfhf,rpc=createChunk) which can be defined in the ozone insight > CLI. > As we have no option to add additional tags to the logs (it may be supported > by log4j2 but not with slf4k), the first implementation can be implemented by > pattern matching. > For example in SCMNodeManager.processNodeReport contains trace/debug logs > which includes the " [datanode={}]" part. This formatting convention can be > used to print out the only the related information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2322) DoubleBuffer flush termination and OM shutdown's after that.
[ https://issues.apache.org/jira/browse/HDDS-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2322 started by Bharat Viswanadham. > DoubleBuffer flush termination and OM shutdown's after that. > > > Key: HDDS-2322 > URL: https://issues.apache.org/jira/browse/HDDS-2322 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > > om1_1 | 2019-10-18 00:34:45,317 [OMDoubleBufferFlushThread] ERROR > - Terminating with exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > om1_1 | java.util.ConcurrentModificationException > om1_1 | at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1660) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > om1_1 | at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > om1_1 | at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) > om1_1 | at > org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup.getProtobuf(OmKeyLocationInfoGroup.java:65) > om1_1 | at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) > om1_1 | at > java.base/java.util.Collections$2.tryAdvance(Collections.java:4745) > om1_1 | at > java.base/java.util.Collections$2.forEachRemaining(Collections.java:4753) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > om1_1 | at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) > om1_1 | at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > om1_1 | at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) > om1_1 | at > org.apache.hadoop.ozone.om.helpers.OmKeyInfo.getProtobuf(OmKeyInfo.java:362) > om1_1 | at > org.apache.hadoop.ozone.om.codec.OmKeyInfoCodec.toPersistedFormat(OmKeyInfoCodec.java:37) > om1_1 | at > org.apache.hadoop.ozone.om.codec.OmKeyInfoCodec.toPersistedFormat(OmKeyInfoCodec.java:31) > om1_1 | at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > om1_1 | at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > om1_1 | at > org.apache.hadoop.ozone.om.response.key.OMKeyCreateResponse.addToDBBatch(OMKeyCreateResponse.java:58) > om1_1 | at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:139) > om1_1 | at > java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > om1_1 | at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:137) > om1_1 | at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2356. -- Resolution: Fixed This will be fixed as part of HDDS-2322. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959238#comment-16959238 ] Bharat Viswanadham edited comment on HDDS-2356 at 10/24/19 9:44 PM: This will be fixed as part of HDDS-2322. Thank You [~timmylicheng] for reporting this issue. was (Author: bharatviswa): This will be fixed as part of HDDS-2322. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2356: - Fix Version/s: 0.5.0 > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2362) XCeiverClientManager issues
[ https://issues.apache.org/jira/browse/HDDS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth reassigned HDDS-2362: -- Assignee: Istvan Fajth > XCeiverClientManager issues > --- > > Key: HDDS-2362 > URL: https://issues.apache.org/jira/browse/HDDS-2362 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > > These issues were revealed while reviewing the XCeiverClientManager, and the > clients. > - secure clients are not released properly, so the reference counting does > not work with secure clients > - even though we have reference counting for the clients, the cache can evict > and remove client instances with active references, as it is not connected > with the reference counts > - isUseRatis, getFactor, getType is not really belonging to this class > - acquireClient and acquireClientForRead and release methods of the same > kind, seems to be a bad smell, we might separate the two things, especially > because reads are using the grpc client while writes are using the ratis > client as I know > - pipelines are leaking from the clients themselves, the pipelines are not > modified in these code paths, but it should be better if we can hide the > pipeline, and don't serve it for the clients, or if we can serve an immutable > version > - ContainerProtocolCalls seems to be something that is extracted to a utility > class but it may be placed into the client itself, as in all the cases, the > client is gathered from the XCeiverClientManager, then given to one of > ContainerProtocolCalls' method, which calls the sendCommandAsync on the > client which does not seem to be necessary, we can encapsulate all the > protobuf message creation, and provide response data from the client. > -ContainerOperationClient acquires the client twice from the > XCeiverClientManager in the createContainer call, but releases it only once > - we can try to get rid of some of the synchronization by trying to eliminate > some of the states in the clients, and the manager, and replace them with > some polymorphism. > I will go through this list one by one and will create JIRAs one by one using > this one as an umbrella JIRA, so that we can create PRs one by one, or if > needed, we can consolidate the whole thing into one PR at the end but review > one by one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2362) XCeiverClientManager issues
Istvan Fajth created HDDS-2362: -- Summary: XCeiverClientManager issues Key: HDDS-2362 URL: https://issues.apache.org/jira/browse/HDDS-2362 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Istvan Fajth These issues were revealed while reviewing the XCeiverClientManager, and the clients. - secure clients are not released properly, so the reference counting does not work with secure clients - even though we have reference counting for the clients, the cache can evict and remove client instances with active references, as it is not connected with the reference counts - isUseRatis, getFactor, getType is not really belonging to this class - acquireClient and acquireClientForRead and release methods of the same kind, seems to be a bad smell, we might separate the two things, especially because reads are using the grpc client while writes are using the ratis client as I know - pipelines are leaking from the clients themselves, the pipelines are not modified in these code paths, but it should be better if we can hide the pipeline, and don't serve it for the clients, or if we can serve an immutable version - ContainerProtocolCalls seems to be something that is extracted to a utility class but it may be placed into the client itself, as in all the cases, the client is gathered from the XCeiverClientManager, then given to one of ContainerProtocolCalls' method, which calls the sendCommandAsync on the client which does not seem to be necessary, we can encapsulate all the protobuf message creation, and provide response data from the client. -ContainerOperationClient acquires the client twice from the XCeiverClientManager in the createContainer call, but releases it only once - we can try to get rid of some of the synchronization by trying to eliminate some of the states in the clients, and the manager, and replace them with some polymorphism. I will go through this list one by one and will create JIRAs one by one using this one as an umbrella JIRA, so that we can create PRs one by one, or if needed, we can consolidate the whole thing into one PR at the end but review one by one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14932) XCeiverClientManager issues
[ https://issues.apache.org/jira/browse/HDFS-14932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth resolved HDFS-14932. - Resolution: Invalid Sorry I meant to create this one in HDDS project, closing this one, as I can not move it. > XCeiverClientManager issues > --- > > Key: HDFS-14932 > URL: https://issues.apache.org/jira/browse/HDFS-14932 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Istvan Fajth >Priority: Major > > These issues were revealed while reviewing the XCeiverClientManager, and the > clients. > - secure clients are not released properly, so the reference counting does > not work with secure clients > - even though we have reference counting for the clients, the cache can evict > and remove client instances with active references, as it is not connected > with the reference counts > - isUseRatis, getFactor, getType is not really belonging to this class > - acquireClient and acquireClientForRead and release methods of the same > kind, seems to be a bad smell, we might separate the two things, especially > because reads are using the grpc client while writes are using the ratis > client as I know > - pipelines are leaking from the clients themselves, the pipelines are not > modified in these code paths, but it should be better if we can hide the > pipeline, and don't serve it for the clients, or if we can serve an immutable > version > - ContainerProtocolCalls seems to be something that is extracted to a utility > class but it may be placed into the client itself, as in all the cases, the > client is gathered from the XCeiverClientManager, then given to one of > ContainerProtocolCalls' method, which calls the sendCommandAsync on the > client which does not seem to be necessary, we can encapsulate all the > protobuf message creation, and provide response data from the client. > -ContainerOperationClient acquires the client twice from the > XCeiverClientManager in the createContainer call, but releases it only once > - we can try to get rid of some of the synchronization by trying to eliminate > some of the states in the clients, and the manager, and replace them with > some polymorphism. > I will go through this list one by one and will create JIRAs one by one using > this one as an umbrella JIRA, so that we can create PRs one by one, or if > needed, we can consolidate the whole thing into one PR at the end but review > one by one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14932) XCeiverClientManager issues
Istvan Fajth created HDFS-14932: --- Summary: XCeiverClientManager issues Key: HDFS-14932 URL: https://issues.apache.org/jira/browse/HDFS-14932 Project: Hadoop HDFS Issue Type: Bug Reporter: Istvan Fajth These issues were revealed while reviewing the XCeiverClientManager, and the clients. - secure clients are not released properly, so the reference counting does not work with secure clients - even though we have reference counting for the clients, the cache can evict and remove client instances with active references, as it is not connected with the reference counts - isUseRatis, getFactor, getType is not really belonging to this class - acquireClient and acquireClientForRead and release methods of the same kind, seems to be a bad smell, we might separate the two things, especially because reads are using the grpc client while writes are using the ratis client as I know - pipelines are leaking from the clients themselves, the pipelines are not modified in these code paths, but it should be better if we can hide the pipeline, and don't serve it for the clients, or if we can serve an immutable version - ContainerProtocolCalls seems to be something that is extracted to a utility class but it may be placed into the client itself, as in all the cases, the client is gathered from the XCeiverClientManager, then given to one of ContainerProtocolCalls' method, which calls the sendCommandAsync on the client which does not seem to be necessary, we can encapsulate all the protobuf message creation, and provide response data from the client. -ContainerOperationClient acquires the client twice from the XCeiverClientManager in the createContainer call, but releases it only once - we can try to get rid of some of the synchronization by trying to eliminate some of the states in the clients, and the manager, and replace them with some polymorphism. I will go through this list one by one and will create JIRAs one by one using this one as an umbrella JIRA, so that we can create PRs one by one, or if needed, we can consolidate the whole thing into one PR at the end but review one by one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959206#comment-16959206 ] Erik Krogen commented on HDFS-14775: I only took a quick look but it seems like a good change. 2 minor things: * In {{FSNamesystemLock}} L167 we fetch and store the current time, we should re-use this below at L169 rather than re-fetching it * The indentation on {{FSNamesystemLock}} L179 seems off? > Add Timestamp for longest FSN write/read lock held log > -- > > Key: HDFS-14775 > URL: https://issues.apache.org/jira/browse/HDFS-14775 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, > HDFS-14775.003.patch, HDFS-14775.004.patch > > > HDFS-13946 improved the log for longest read/write lock held time, it's very > useful improvement. > In some condition, we need to locate the detailed call information(user, ip, > path, etc.) for longest lock holder, but the default throttle interval(10s) > is too long to find the corresponding audit log. I think we should add the > timestamp for the {{longestWriteLockHeldStackTrace}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2361) Ozone Manager init & start command prints out unnecessary line in the beginning.
Aravindan Vijayan created HDDS-2361: --- Summary: Ozone Manager init & start command prints out unnecessary line in the beginning. Key: HDDS-2361 URL: https://issues.apache.org/jira/browse/HDDS-2361 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Aravindan Vijayan {code} [root@avijayan-om-1 ozone-0.5.0-SNAPSHOT]# bin/ozone --daemon start om Ozone Manager classpath extended by {code} We could probably print this line only when extra elements are added to OM classpathor skip printing this line altogether. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2360) Update Ratis snapshot to d6d58d0
[ https://issues.apache.org/jira/browse/HDDS-2360?focusedWorklogId=333704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333704 ] ASF GitHub Bot logged work on HDDS-2360: Author: ASF GitHub Bot Created on: 24/Oct/19 20:31 Start Date: 24/Oct/19 20:31 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #83: HDDS-2360. Update Ratis snapshot to d6d58d0 URL: https://github.com/apache/hadoop-ozone/pull/83 ## What changes were proposed in this pull request? Update Ratis dependency version to snapshot [d6d58d0](https://github.com/apache/incubator-ratis/commit/d6d58d0), to fix memory issues ([RATIS-726](https://issues.apache.org/jira/browse/RATIS-726), [RATIS-728](https://issues.apache.org/jira/browse/RATIS-728)). Thanks @szetszwo and @bshashikant for the fixes, and @mukul1987 for creating the snapshot release. https://issues.apache.org/jira/browse/HDDS-2360 ## How was this patch tested? Tested with Freon using 1MB and 16MB keys. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333704) Remaining Estimate: 0h Time Spent: 10m > Update Ratis snapshot to d6d58d0 > > > Key: HDDS-2360 > URL: https://issues.apache.org/jira/browse/HDDS-2360 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Client, Ozone Datanode >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Update Ratis dependency version to snapshot > [d6d58d0|https://github.com/apache/incubator-ratis/commit/d6d58d0], to fix > memory issues (RATIS-726, RATIS-728). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2360) Update Ratis snapshot to d6d58d0
[ https://issues.apache.org/jira/browse/HDDS-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2360: - Labels: pull-request-available (was: ) > Update Ratis snapshot to d6d58d0 > > > Key: HDDS-2360 > URL: https://issues.apache.org/jira/browse/HDDS-2360 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Client, Ozone Datanode >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > Update Ratis dependency version to snapshot > [d6d58d0|https://github.com/apache/incubator-ratis/commit/d6d58d0], to fix > memory issues (RATIS-726, RATIS-728). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959200#comment-16959200 ] Bharat Viswanadham commented on HDDS-2356: -- The issue is right now entries added are mutable, For Volume/Bucket this is fixed as part of HDDS-2344 and HDDS-2343. There is already a Jira HDDS-2322 which is seen with similar stack trace error when doing Key creation. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
[ https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959196#comment-16959196 ] Hudson commented on HDFS-14910: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17570 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17570/]) HDFS-14910. Rename Snapshot with Pre Descendants Fail With (github: rev a1b4eebcc92976a9fb78ad5d3ab70c52cc0a5fa7) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java > Rename Snapshot with Pre Descendants Fail With IllegalArgumentException. > > > Key: HDFS-14910 > URL: https://issues.apache.org/jira/browse/HDFS-14910 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Íñigo Goiri >Assignee: Wei-Chiu Chuang >Priority: Blocker > Fix For: 3.3.0, 3.1.4, 3.2.2 > > > TestRenameWithSnapshots#testRename2PreDescendant has been failing > consistently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14492) Snapshot memory leak
[ https://issues.apache.org/jira/browse/HDFS-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14492: --- Fix Version/s: 3.2.2 3.1.4 > Snapshot memory leak > > > Key: HDFS-14492 > URL: https://issues.apache.org/jira/browse/HDFS-14492 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 2.6.0 > Environment: CDH5.14.4 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > > We recently examined the NameNode heap dump of a big, heavy snapshot user, > trying to trim some fat, and surely enough we found memory leak in it: when > snapshots are removed, the corresponding data structures are not removed. > This cluster has 586 million file system objects (286 million files, 287 > million blocks, 13 million directories), using around 132gb of heap. > While only 44.5 million files have snapshotted copies, > (INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have > FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies > at some point in the past, but after snapshots are removed, those data > structured are still kept in the heap. > INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes, > FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in > large clusters like this. In this cluster, a whopping 13.8gb of memory could > have been saved: ((32.5 + 32 + 24) bytes * (211997769 - 44572380) =~ > 13.8gb) if not for this bug. That is more than 10% of savings in heap size. > Heap histogram for reference: > {noformat} > num #instances #bytes class name > -- > 1: 286418254 27496152384 org.apache.hadoop.hdfs.server.namenode.INodeFile > 2: 28737 18388622528 > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo > 3: 227899550 17144816120 [B > 4: 287324031 13769408616 > [Lorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo; > 5: 71352116 12353841568 [Ljava.lang.Object; > 6: 286322650 9170335840 > [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; > 7: 235632329 7658462416 > [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature; > 8: 4 7046430816 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement; > 9: 211997769 6783928608 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature > 10: 211997769 5087946456 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList > 11: 76586261 3780468856 [I > 12: 44572380 3209211360 > org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy > 13: 58634517 2345380680 java.util.ArrayList > 14: 44572380 2139474240 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff > 15: 76582416 1837977984 org.apache.hadoop.hdfs.server.namenode.AclFeature > 16: 12907668 1135874784 > org.apache.hadoop.hdfs.server.namenode.INodeDirectory{noformat} > [~szetszwo] [~arpaga] [~smeng] [~shashikant] any thoughts? > I am thinking that inside > AbstractINodeDiffList#deleteSnapshotDiff() , in addition to cleaning up file > diffs, it should also remove FileWithSnapshotFeature. I am not familiar with > the snapshot implementation, so any guidance is greatly appreciated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
[ https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14910: --- Fix Version/s: 3.2.2 3.1.4 > Rename Snapshot with Pre Descendants Fail With IllegalArgumentException. > > > Key: HDFS-14910 > URL: https://issues.apache.org/jira/browse/HDFS-14910 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Íñigo Goiri >Assignee: Wei-Chiu Chuang >Priority: Blocker > Fix For: 3.3.0, 3.1.4, 3.2.2 > > > TestRenameWithSnapshots#testRename2PreDescendant has been failing > consistently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14492) Snapshot memory leak
[ https://issues.apache.org/jira/browse/HDFS-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14492: --- Fix Version/s: (was: 3.1.4) 3.3.0 > Snapshot memory leak > > > Key: HDFS-14492 > URL: https://issues.apache.org/jira/browse/HDFS-14492 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 2.6.0 > Environment: CDH5.14.4 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Fix For: 3.3.0 > > > We recently examined the NameNode heap dump of a big, heavy snapshot user, > trying to trim some fat, and surely enough we found memory leak in it: when > snapshots are removed, the corresponding data structures are not removed. > This cluster has 586 million file system objects (286 million files, 287 > million blocks, 13 million directories), using around 132gb of heap. > While only 44.5 million files have snapshotted copies, > (INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have > FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies > at some point in the past, but after snapshots are removed, those data > structured are still kept in the heap. > INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes, > FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in > large clusters like this. In this cluster, a whopping 13.8gb of memory could > have been saved: ((32.5 + 32 + 24) bytes * (211997769 - 44572380) =~ > 13.8gb) if not for this bug. That is more than 10% of savings in heap size. > Heap histogram for reference: > {noformat} > num #instances #bytes class name > -- > 1: 286418254 27496152384 org.apache.hadoop.hdfs.server.namenode.INodeFile > 2: 28737 18388622528 > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo > 3: 227899550 17144816120 [B > 4: 287324031 13769408616 > [Lorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo; > 5: 71352116 12353841568 [Ljava.lang.Object; > 6: 286322650 9170335840 > [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; > 7: 235632329 7658462416 > [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature; > 8: 4 7046430816 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement; > 9: 211997769 6783928608 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature > 10: 211997769 5087946456 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList > 11: 76586261 3780468856 [I > 12: 44572380 3209211360 > org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy > 13: 58634517 2345380680 java.util.ArrayList > 14: 44572380 2139474240 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff > 15: 76582416 1837977984 org.apache.hadoop.hdfs.server.namenode.AclFeature > 16: 12907668 1135874784 > org.apache.hadoop.hdfs.server.namenode.INodeDirectory{noformat} > [~szetszwo] [~arpaga] [~smeng] [~shashikant] any thoughts? > I am thinking that inside > AbstractINodeDiffList#deleteSnapshotDiff() , in addition to cleaning up file > diffs, it should also remove FileWithSnapshotFeature. I am not familiar with > the snapshot implementation, so any guidance is greatly appreciated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
[ https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14910: --- Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~ayushtkn] for the review and [~inigoiri] for raising this issue. I'll also cherry pick the commit to lower branches. > Rename Snapshot with Pre Descendants Fail With IllegalArgumentException. > > > Key: HDFS-14910 > URL: https://issues.apache.org/jira/browse/HDFS-14910 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Íñigo Goiri >Assignee: Wei-Chiu Chuang >Priority: Blocker > Fix For: 3.3.0 > > > TestRenameWithSnapshots#testRename2PreDescendant has been failing > consistently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14450) Erasure Coding: decommissioning datanodes cause replicate a large number of duplicate EC internal blocks
[ https://issues.apache.org/jira/browse/HDFS-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14450: --- Resolution: Duplicate Status: Resolved (was: Patch Available) Close this one as a dup. Thanks [~ferhui] for confirmation. And thanks [~wuweiwei] for raising the issue. > Erasure Coding: decommissioning datanodes cause replicate a large number of > duplicate EC internal blocks > > > Key: HDFS-14450 > URL: https://issues.apache.org/jira/browse/HDFS-14450 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ec >Reporter: Wu Weiwei >Assignee: Wu Weiwei >Priority: Major > Attachments: HDFS-14450-000.patch > > > {code:java} > // [WARN] [RedundancyMonitor] : Failed to place enough replicas, still in > need of 2 to reach 167 (unavailableStorages=[DISK, ARCHIVE], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All > required storage types are unavailable: unavailableStorages=[DISK, ARCHIVE], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > In a large-scale cluster, decommissioning large-scale datanodes cause EC > block groups to replicate a large number of duplicate internal blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width
[ https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14931: --- Status: Patch Available (was: Open) > hdfs crypto commands limit column width > --- > > Key: HDFS-14931 > URL: https://issues.apache.org/jira/browse/HDFS-14931 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: HDFS-14931.001.patch > > > {noformat} > foo@bar$ hdfs crypto -listZones > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1 encr > > yptio > nzon > e1 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2 encr > > yptio > nzon > e2 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3 encr > > yptio > nzon > e3 > {noformat} > The command ends up looking something really ugly like this when the path is > long. This also makes it very difficult to pipe the output into other > utilities, such as awk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width
[ https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-14931: --- Attachment: HDFS-14931.001.patch > hdfs crypto commands limit column width > --- > > Key: HDFS-14931 > URL: https://issues.apache.org/jira/browse/HDFS-14931 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: HDFS-14931.001.patch > > > {noformat} > foo@bar$ hdfs crypto -listZones > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1 encr > > yptio > nzon > e1 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2 encr > > yptio > nzon > e2 > /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3 encr > > yptio > nzon > e3 > {noformat} > The command ends up looking something really ugly like this when the path is > long. This also makes it very difficult to pipe the output into other > utilities, such as awk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14931) hdfs crypto commands limit column width
Eric Badger created HDFS-14931: -- Summary: hdfs crypto commands limit column width Key: HDFS-14931 URL: https://issues.apache.org/jira/browse/HDFS-14931 Project: Hadoop HDFS Issue Type: Bug Reporter: Eric Badger Assignee: Eric Badger {noformat} foo@bar$ hdfs crypto -listZones /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1 encr yptio nzon e1 /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2 encr yptio nzon e2 /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3 encr yptio nzon e3 {noformat} The command ends up looking something really ugly like this when the path is long. This also makes it very difficult to pipe the output into other utilities, such as awk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.
[ https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDDS-2283: --- Fix Version/s: 0.5.0 > Container creation on datanodes take time because of Rocksdb option creation. > - > > Key: HDDS-2283 > URL: https://issues.apache.org/jira/browse/HDDS-2283 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-2283.00.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Container Creation on datanodes take around 300ms due to rocksdb creation. > Rocksdb creation is taking a considerable time and this needs to be optimized. > Creating a rocksdb per disk should be enough and each container can be table > inside the rocksdb. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959180#comment-16959180 ] Jitendra Nath Pandey commented on HDDS-2356: This seems very similar to HDDS-2355. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14284) RBF: Log Router identifier when reporting exceptions
[ https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959178#comment-16959178 ] Hadoop QA commented on HDFS-14284: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 59s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 6s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterFaultTolerant | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14284 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983954/HDFS-14284.007.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux cff374255ed5 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2eba2624 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28171/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28171/testReport/ | | Max. process+thread count | 2765 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28171/console | | Powered by | Apache Yetus 0.8.0
[jira] [Created] (HDDS-2360) Update Ratis snapshot to d6d58d0
Attila Doroszlai created HDDS-2360: -- Summary: Update Ratis snapshot to d6d58d0 Key: HDDS-2360 URL: https://issues.apache.org/jira/browse/HDDS-2360 Project: Hadoop Distributed Data Store Issue Type: Task Components: Ozone Client, Ozone Datanode Reporter: Attila Doroszlai Assignee: Attila Doroszlai Update Ratis dependency version to snapshot [d6d58d0|https://github.com/apache/incubator-ratis/commit/d6d58d0], to fix memory issues (RATIS-726, RATIS-728). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14284) RBF: Log Router identifier when reporting exceptions
[ https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hemanthboyina updated HDFS-14284: - Attachment: HDFS-14284.007.patch > RBF: Log Router identifier when reporting exceptions > > > Key: HDFS-14284 > URL: https://issues.apache.org/jira/browse/HDFS-14284 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14284.001.patch, HDFS-14284.002.patch, > HDFS-14284.003.patch, HDFS-14284.004.patch, HDFS-14284.005.patch, > HDFS-14284.006.patch, HDFS-14284.007.patch > > > The typical setup is to use multiple Routers through > ConfiguredFailoverProxyProvider. > In a regular HA Namenode setup, it is easy to know which NN was used. > However, in RBF, any Router can be the one reporting the exception and it is > hard to know which was the one. > We should have a way to identify which Router/Namenode was the one triggering > the exception. > This would also apply with Observer Namenodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959106#comment-16959106 ] Íñigo Goiri commented on HDFS-14775: [~xkrogen], [~shv], thoughts? > Add Timestamp for longest FSN write/read lock held log > -- > > Key: HDFS-14775 > URL: https://issues.apache.org/jira/browse/HDFS-14775 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, > HDFS-14775.003.patch, HDFS-14775.004.patch > > > HDFS-13946 improved the log for longest read/write lock held time, it's very > useful improvement. > In some condition, we need to locate the detailed call information(user, ip, > path, etc.) for longest lock holder, but the default throttle interval(10s) > is too long to find the corresponding audit log. I think we should add the > timestamp for the {{longestWriteLockHeldStackTrace}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.
[ https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959103#comment-16959103 ] Íñigo Goiri commented on HDFS-14928: Thanks [~risyomei] for bringing up the proposals. Given that we are putting the icon next to the text, I don't think there is a need for the legend. The legend would make sense if we removed the text. I prefer without the legend (proposed 1) and having the text next to it (new for the DN). Another option would be to put the icon after the text instead of before. > UI: unifying the WebUI across different components. > --- > > Key: HDFS-14928 > URL: https://issues.apache.org/jira/browse/HDFS-14928 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Priority: Trivial > Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, > NN_orig.png, NN_with_legend.png, NN_wo_legend.png, RBF_orig.png, > RBF_with_legend.png, RBF_wo_legend.png > > > The WebUI of different components could be unified. > *Router:* > |Current| !RBF_orig.png|width=500! | > |Proposed 1 (With Icon) | !RBF_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500! | > *NameNode:* > |Current| !NN_orig.png|width=500! | > |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! | > *DataNode:* > |Current| !DN_orig.png|width=500! | > |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2301) Write path: Reduce read contention in rocksDB
[ https://issues.apache.org/jira/browse/HDDS-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959100#comment-16959100 ] Aravindan Vijayan commented on HDDS-2301: - [~sdeka] Here are some useful RocksDB configs in OM * Enable RocksDB metrics - *ozone.metastore.rocksdb.statistics=ALL* * Enable RocksDB logging - *rocksdb.logging.enabled=true* * Enable RocksDB DEBUG logging - *rocksdb.logging.level=DEBUG* > Write path: Reduce read contention in rocksDB > - > > Key: HDDS-2301 > URL: https://issues.apache.org/jira/browse/HDDS-2301 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Rajesh Balamohan >Assignee: Supratim Deka >Priority: Major > Labels: performance > Attachments: om_write_profile.png > > > Benchmark: > > Simple benchmark which creates 100 and 1000s of keys (empty directory) in > OM. This is done in a tight loop and multiple threads from client side to add > enough load on CPU. Note that intention is to understand the bottlenecks in > OM (intentionally avoiding interactions with SCM & DN). > Observation: > - > During write path, Ozone checks {{OMFileRequest.verifyFilesInPath}}. This > internally calls {{omMetadataManager.getKeyTable().get(dbKeyName)}} for every > write operation. This turns out to be expensive and chokes the write path. > [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMDirectoryCreateRequest.java#L155] > [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileRequest.java#L63] > In most of the cases, directory creation would be fresh entry. In such cases, > it would be good to try with {{RocksDB::keyMayExist.}} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2301) Write path: Reduce read contention in rocksDB
[ https://issues.apache.org/jira/browse/HDDS-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959100#comment-16959100 ] Aravindan Vijayan edited comment on HDDS-2301 at 10/24/19 5:51 PM: --- [~sdeka] Here are some useful RocksDB configs in OM * Enable RocksDB metrics - *ozone.metastore.rocksdb.statistics=ALL* * Enable RocksDB logging - *hadoop.hdds.db.rocksdb.logging.enabled=true* * Enable RocksDB DEBUG logging - *hadoop.hdds.db.rocksdb.logging.level=DEBUG* was (Author: avijayan): [~sdeka] Here are some useful RocksDB configs in OM * Enable RocksDB metrics - *ozone.metastore.rocksdb.statistics=ALL* * Enable RocksDB logging - *rocksdb.logging.enabled=true* * Enable RocksDB DEBUG logging - *rocksdb.logging.level=DEBUG* > Write path: Reduce read contention in rocksDB > - > > Key: HDDS-2301 > URL: https://issues.apache.org/jira/browse/HDDS-2301 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Rajesh Balamohan >Assignee: Supratim Deka >Priority: Major > Labels: performance > Attachments: om_write_profile.png > > > Benchmark: > > Simple benchmark which creates 100 and 1000s of keys (empty directory) in > OM. This is done in a tight loop and multiple threads from client side to add > enough load on CPU. Note that intention is to understand the bottlenecks in > OM (intentionally avoiding interactions with SCM & DN). > Observation: > - > During write path, Ozone checks {{OMFileRequest.verifyFilesInPath}}. This > internally calls {{omMetadataManager.getKeyTable().get(dbKeyName)}} for every > write operation. This turns out to be expensive and chokes the write path. > [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMDirectoryCreateRequest.java#L155] > [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileRequest.java#L63] > In most of the cases, directory creation would be fresh entry. In such cases, > it would be good to try with {{RocksDB::keyMayExist.}} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14917) Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html"
[ https://issues.apache.org/jira/browse/HDFS-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959097#comment-16959097 ] Íñigo Goiri commented on HDFS-14917: OK, let's follow up in HDFS-14928. +1 on [^HDFS-14917.patch]. > Change the ICON of "Decommissioned & dead" datanode on "dfshealth.html" > --- > > Key: HDFS-14917 > URL: https://issues.apache.org/jira/browse/HDFS-14917 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Attachments: HDFS-14917.patch, image-2019-10-21-17-49-10-635.png, > image-2019-10-21-17-49-58-759.png, image-2019-10-21-18-03-53-914.png, > image-2019-10-21-18-04-52-405.png, image-2019-10-21-18-05-19-160.png, > image-2019-10-21-18-13-01-884.png, image-2019-10-21-18-13-54-427.png > > > This is a really simple UI change proposal: > The icon of "Decommissioned & dead" datanode could be improved. It can be > changed from !image-2019-10-21-18-05-19-160.png|width=31,height=28! to > !image-2019-10-21-18-04-52-405.png|width=32,height=29! so that, > # icon " !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be > used for all status starts with "decommission" on dfshealth.html, > # icon " !image-2019-10-21-18-13-01-884.png|width=26,height=25! " can be > differentiated with icon " !image-2019-10-21-18-13-54-427.png! " on > federationhealth.html > |*DataNode Infomation Legend (now)* > dfshealth.html#tab-datanode > |!image-2019-10-21-17-49-10-635.png|width=516,height=55!| > |*DataNode* *Infomation* *Legend (proposed)* > dfshealth.html#tab-datanode > |!image-2019-10-21-18-03-53-914.png|width=589,height=60!| > |*NameService Legend* > > federationhealth.htm#tab-namenode|!image-2019-10-21-17-49-58-759.png|width=445,height=43!| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh reassigned HDDS-2356: --- Assignee: Bharat Viswanadham > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135) > at java.lang.Thread.run(Thread.java:745) > 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14925) rename operation should check nest snapshot
[ https://issues.apache.org/jira/browse/HDFS-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958990#comment-16958990 ] Junwang Zhao commented on HDFS-14925: - [~sodonnell] thanks for your reply. {code:java} hadoop fs -mv /project/folder/test /other_project/folder/ {code} ⬆️ will not be denied, what should be denied is ⬇️: {code:java} hadoop fs -mv /project /other_project/folder {code} because /project is snapshot root. In your case, because /project has snapshot so the validateRenameSource already make sure this will be denied, what i'm fixing is that if /project does'nt have snapshot, but it is snapshottable, the mv operation won't be denied. To make it more clear, you can try the following: {code:java} hdfs dfs -mkdir /dir1 hdfs dfs -mkdir /dir2 hdfs dfsadmin -allowSnapshot /dir1 hdfs dfsadmin -allowSnapshot /dir2 hdfs dfs -createSnapshot /dir1 snap1 hdfs dfs -mv /dir2 /dir1/ hdfs dfs -createSnapshot /dir1/dir2 snap2{code} Which will cause nested snapshot. > rename operation should check nest snapshot > --- > > Key: HDFS-14925 > URL: https://issues.apache.org/jira/browse/HDFS-14925 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Junwang Zhao >Priority: Major > > When we do rename operation, If the src directory or any of its descendant > is snapshottable and the dst directory or any of its ancestors is > snapshottable, > we consider this as nested snapshot, which should be denied. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958975#comment-16958975 ] Ayush Saxena commented on HDFS-14882: - Thanx [~hexiaoqiao] for the patch v008 LGTM +1 > Consider DataNode load when #getBlockLocation > - > > Key: HDFS-14882 > URL: https://issues.apache.org/jira/browse/HDFS-14882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, > HDFS-14882.003.patch, HDFS-14882.004.patch, HDFS-14882.005.patch, > HDFS-14882.006.patch, HDFS-14882.007.patch, HDFS-14882.008.patch > > > Currently, we consider load of datanode when #chooseTarget for writer, > however not consider it for reader. Thus, the process slot of datanode could > be occupied by #BlockSender for reader, and disk/network will be busy > workload, then meet some slow node exception. IIRC same case is reported > times. Based on the fact, I propose to consider load for reader same as it > did #chooseTarget for writer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
[ https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958970#comment-16958970 ] Ayush Saxena commented on HDFS-14910: - Thanx [~weichiu] for the PR. fix LGTM +1 > Rename Snapshot with Pre Descendants Fail With IllegalArgumentException. > > > Key: HDFS-14910 > URL: https://issues.apache.org/jira/browse/HDFS-14910 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Íñigo Goiri >Assignee: Wei-Chiu Chuang >Priority: Blocker > > TestRenameWithSnapshots#testRename2PreDescendant has been failing > consistently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2188) Implement LocatedFileStatus & getFileBlockLocations to provide node/localization information to Yarn/Mapreduce
[ https://issues.apache.org/jira/browse/HDDS-2188?focusedWorklogId=333509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333509 ] ASF GitHub Bot logged work on HDDS-2188: Author: ASF GitHub Bot Created on: 24/Oct/19 15:17 Start Date: 24/Oct/19 15:17 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #1631: HDDS-2188 : Implement LocatedFileStatus & getFileBlockLocations to pr… URL: https://github.com/apache/hadoop/pull/1631#discussion_r338635266 ## File path: hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java ## @@ -626,6 +640,16 @@ public FileStatus getFileStatus(Path f) throws IOException { return fileStatus; } + public BlockLocation[] getFileBlockLocations(FileStatus fileStatus, Review comment: add @override This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333509) Time Spent: 1h 10m (was: 1h) > Implement LocatedFileStatus & getFileBlockLocations to provide > node/localization information to Yarn/Mapreduce > -- > > Key: HDDS-2188 > URL: https://issues.apache.org/jira/browse/HDDS-2188 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Affects Versions: 0.5.0 >Reporter: Mukul Kumar Singh >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > For applications like Hive/MapReduce to take advantage of the data locality > in Ozone, Ozone should return the location of the Ozone blocks. This is > needed for better read performance for Hadoop Applications. > {code} > if (file instanceof LocatedFileStatus) { > blkLocations = ((LocatedFileStatus) file).getBlockLocations(); > } else { > blkLocations = fs.getFileBlockLocations(file, 0, length); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14880) Balancer sequence of statistics & exit message is not correct
[ https://issues.apache.org/jira/browse/HDFS-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958962#comment-16958962 ] Ayush Saxena commented on HDFS-14880: - V0003 LGTM +1 [~vinayakumarb] any comments? > Balancer sequence of statistics & exit message is not correct > - > > Key: HDFS-14880 > URL: https://issues.apache.org/jira/browse/HDFS-14880 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 3.1.1, 3.2.1 > Environment: Run the balancer tool in cluster. >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Attachments: HDFS-14880.0001.patch, HDFS-14880.0002.patch, > HDFS-14880.0003.patch > > > Actual: > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved > The cluster is balanced. Exiting... > Sep 27, 2019 5:13:15 PM 0 0 B 0 B > 0 B > Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds > Done! > Expected: Exit message should be after loggin all the balancer movement > statistics data. > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved > Sep 27, 2019 5:13:15 PM 0 0 B 0 B > 0 B > The cluster is balanced. Exiting... > Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds > Done! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14880) Balancer sequence of statistics & exit message is not correct
[ https://issues.apache.org/jira/browse/HDFS-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958947#comment-16958947 ] Renukaprasad C edited comment on HDFS-14880 at 10/24/19 2:56 PM: - Thanks [~ayushtkn] for quick review. I have made the changes & submitted the patch. Plz review. There is a test failure - TestRenameWithSnapshots, this is not related to the patch i have submitted. was (Author: prasad-acit): Thanks [~ayushtkn] for quick review. > Balancer sequence of statistics & exit message is not correct > - > > Key: HDFS-14880 > URL: https://issues.apache.org/jira/browse/HDFS-14880 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 3.1.1, 3.2.1 > Environment: Run the balancer tool in cluster. >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Attachments: HDFS-14880.0001.patch, HDFS-14880.0002.patch, > HDFS-14880.0003.patch > > > Actual: > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved > The cluster is balanced. Exiting... > Sep 27, 2019 5:13:15 PM 0 0 B 0 B > 0 B > Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds > Done! > Expected: Exit message should be after loggin all the balancer movement > statistics data. > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved > Sep 27, 2019 5:13:15 PM 0 0 B 0 B > 0 B > The cluster is balanced. Exiting... > Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds > Done! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14880) Balancer sequence of statistics & exit message is not correct
[ https://issues.apache.org/jira/browse/HDFS-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958947#comment-16958947 ] Renukaprasad C commented on HDFS-14880: --- Thanks [~ayushtkn] for quick review. > Balancer sequence of statistics & exit message is not correct > - > > Key: HDFS-14880 > URL: https://issues.apache.org/jira/browse/HDFS-14880 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 3.1.1, 3.2.1 > Environment: Run the balancer tool in cluster. >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Attachments: HDFS-14880.0001.patch, HDFS-14880.0002.patch, > HDFS-14880.0003.patch > > > Actual: > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved > The cluster is balanced. Exiting... > Sep 27, 2019 5:13:15 PM 0 0 B 0 B > 0 B > Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds > Done! > Expected: Exit message should be after loggin all the balancer movement > statistics data. > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved > Sep 27, 2019 5:13:15 PM 0 0 B 0 B > 0 B > The cluster is balanced. Exiting... > Sep 27, 2019 5:13:15 PM Balancing took 1.726 seconds > Done! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads
[ https://issues.apache.org/jira/browse/HDDS-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2359: - Labels: pull-request-available (was: ) > Seeking randomly in a key with more than 2 blocks of data leads to > inconsistent reads > - > > Key: HDDS-2359 > URL: https://issues.apache.org/jira/browse/HDDS-2359 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Shashikant Banerjee >Priority: Critical > Labels: pull-request-available > > During Hive testing we found the following exception: > {code} > TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : > attempt_1569246922012_0214_1_03_00_3:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 16 more > Caused by: java.io.IOException: java.io.IOException: error iterating > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:366) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > ... 18 more > Caused by: java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:835) > at > org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:74) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361) > ... 24 more > Caused by: java.io.IOException: Error reading file: > o3fs://hive.warehouse.vc0136.halxg.cloudera.com:9862/data/inventory/delta_001_001_/bucket_0 > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1283) > at >
[jira] [Work logged] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads
[ https://issues.apache.org/jira/browse/HDDS-2359?focusedWorklogId=333473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333473 ] ASF GitHub Bot logged work on HDDS-2359: Author: ASF GitHub Bot Created on: 24/Oct/19 14:42 Start Date: 24/Oct/19 14:42 Worklog Time Spent: 10m Work Description: bshashikant commented on pull request #82: HDDS-2359. Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads URL: https://github.com/apache/hadoop-ozone/pull/82 ## What changes were proposed in this pull request? The issue was primarily caused when first seek to an offset , then read followed by seek to a different offset and read data again both containing overlapping set of chunks . Once a seek to a position is done, the chunkPosition inside each blockInputStream is not correctly set to 0 thereby, the 1st which to which the seek offset belongs is correctly read but for the next subsequent chunks , data to be read will be returned as zero as a result of which , all the read for the subsequent chunks will return length to be read as 0. The solution here is to reset all the subsequent chunks for all subsequent blocks after a seek to set to 0 so once that it will start read from the beginning of each chunk. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2359 ## How was this patch tested? The patch was tested with addition of unit tests which reliably reproduce the issue. This was also deployed in real cluster where the issue was first discovered and verified. Thanks @fapifta for discovering the issue and help verifying the fix as well. Thanks @bharatviswa504 and @hanishakoneru for the contribution in the fix provided. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333473) Remaining Estimate: 0h Time Spent: 10m > Seeking randomly in a key with more than 2 blocks of data leads to > inconsistent reads > - > > Key: HDDS-2359 > URL: https://issues.apache.org/jira/browse/HDDS-2359 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Shashikant Banerjee >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > During Hive testing we found the following exception: > {code} > TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : > attempt_1569246922012_0214_1_03_00_3:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: error iterating > at >
[jira] [Work logged] (HDDS-2357) Add replication factor option to new Freon tests
[ https://issues.apache.org/jira/browse/HDDS-2357?focusedWorklogId=333439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333439 ] ASF GitHub Bot logged work on HDDS-2357: Author: ASF GitHub Bot Created on: 24/Oct/19 14:20 Start Date: 24/Oct/19 14:20 Worklog Time Spent: 10m Work Description: arp7 commented on pull request #79: HDDS-2357. Add replication factor option to new Freon tests URL: https://github.com/apache/hadoop-ozone/pull/79 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333439) Time Spent: 20m (was: 10m) > Add replication factor option to new Freon tests > > > Key: HDDS-2357 > URL: https://issues.apache.org/jira/browse/HDDS-2357 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: freon >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > New Freon generators (OCKG and OKG) use fixed replication factor of 3. > Sometimes it's useful to be able to test single-node replication. The goal > of this task to add a command-line option to specify replication factor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2357) Add replication factor option to new Freon tests
[ https://issues.apache.org/jira/browse/HDDS-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-2357: Labels: (was: pull-request-available) > Add replication factor option to new Freon tests > > > Key: HDDS-2357 > URL: https://issues.apache.org/jira/browse/HDDS-2357 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: freon >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > New Freon generators (OCKG and OKG) use fixed replication factor of 3. > Sometimes it's useful to be able to test single-node replication. The goal > of this task to add a command-line option to specify replication factor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2357) Add replication factor option to new Freon tests
[ https://issues.apache.org/jira/browse/HDDS-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-2357: Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) +1 I've committed this via GitHub. Thanks for the contribution [~adoroszlai]. > Add replication factor option to new Freon tests > > > Key: HDDS-2357 > URL: https://issues.apache.org/jira/browse/HDDS-2357 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: freon >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > New Freon generators (OCKG and OKG) use fixed replication factor of 3. > Sometimes it's useful to be able to test single-node replication. The goal > of this task to add a command-line option to specify replication factor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2358) Change to replication factor THREE in acceptance tests
[ https://issues.apache.org/jira/browse/HDDS-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-2358: Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) +1 I've committed this via GitHub. Thanks for catching this [~adoroszlai]. > Change to replication factor THREE in acceptance tests > -- > > Key: HDDS-2358 > URL: https://issues.apache.org/jira/browse/HDDS-2358 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Acceptance test clusters are currently configured to replication factor of > ONE. This way the [test > succeeds|https://elek.github.io/ozone-ci-q4/pr/pr-hdds-2305-c92ks/acceptance/summary.html] > in spite of Ratis leader election problems (note "term 1464"): > {noformat:title=https://raw.githubusercontent.com/elek/ozone-ci-q4/master/pr/pr-hdds-2305-c92ks/acceptance/docker-ozones3-ozones3-s3-scm.log} > datanode_2 | 2019-10-15 03:18:06,953 INFO impl.RoleInfo: > 749b19c7-0772-44d2-8efd-0664e6aa0748: start FollowerState > datanode_2 | 2019-10-15 03:18:06,953 INFO impl.FollowerState: > 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13-FollowerState was > interrupted: java.lang.InterruptedException: sleep interrupted > datanode_2 | 2019-10-15 03:18:07,090 INFO impl.RaftServerImpl: > 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13: changes role from > FOLLOWER to FOLLOWER at term 1464 for > recognizeCandidate:5ce55bf6-dbcc-40fb-8fb4-6e78032f4b8c > datanode_2 | 2019-10-15 03:18:07,090 INFO impl.RoleInfo: > 749b19c7-0772-44d2-8efd-0664e6aa0748: shutdown FollowerState > {noformat} > The goal of this change is to configure factor of THREE, to allow acceptance > test to catch such issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2358) Change to replication factor THREE in acceptance tests
[ https://issues.apache.org/jira/browse/HDDS-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-2358: Labels: (was: pull-request-available) > Change to replication factor THREE in acceptance tests > -- > > Key: HDDS-2358 > URL: https://issues.apache.org/jira/browse/HDDS-2358 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Acceptance test clusters are currently configured to replication factor of > ONE. This way the [test > succeeds|https://elek.github.io/ozone-ci-q4/pr/pr-hdds-2305-c92ks/acceptance/summary.html] > in spite of Ratis leader election problems (note "term 1464"): > {noformat:title=https://raw.githubusercontent.com/elek/ozone-ci-q4/master/pr/pr-hdds-2305-c92ks/acceptance/docker-ozones3-ozones3-s3-scm.log} > datanode_2 | 2019-10-15 03:18:06,953 INFO impl.RoleInfo: > 749b19c7-0772-44d2-8efd-0664e6aa0748: start FollowerState > datanode_2 | 2019-10-15 03:18:06,953 INFO impl.FollowerState: > 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13-FollowerState was > interrupted: java.lang.InterruptedException: sleep interrupted > datanode_2 | 2019-10-15 03:18:07,090 INFO impl.RaftServerImpl: > 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13: changes role from > FOLLOWER to FOLLOWER at term 1464 for > recognizeCandidate:5ce55bf6-dbcc-40fb-8fb4-6e78032f4b8c > datanode_2 | 2019-10-15 03:18:07,090 INFO impl.RoleInfo: > 749b19c7-0772-44d2-8efd-0664e6aa0748: shutdown FollowerState > {noformat} > The goal of this change is to configure factor of THREE, to allow acceptance > test to catch such issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2358) Change to replication factor THREE in acceptance tests
[ https://issues.apache.org/jira/browse/HDDS-2358?focusedWorklogId=333438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333438 ] ASF GitHub Bot logged work on HDDS-2358: Author: ASF GitHub Bot Created on: 24/Oct/19 14:18 Start Date: 24/Oct/19 14:18 Worklog Time Spent: 10m Work Description: arp7 commented on pull request #78: HDDS-2358. Change to replication factor THREE in acceptance tests URL: https://github.com/apache/hadoop-ozone/pull/78 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333438) Time Spent: 20m (was: 10m) > Change to replication factor THREE in acceptance tests > -- > > Key: HDDS-2358 > URL: https://issues.apache.org/jira/browse/HDDS-2358 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Acceptance test clusters are currently configured to replication factor of > ONE. This way the [test > succeeds|https://elek.github.io/ozone-ci-q4/pr/pr-hdds-2305-c92ks/acceptance/summary.html] > in spite of Ratis leader election problems (note "term 1464"): > {noformat:title=https://raw.githubusercontent.com/elek/ozone-ci-q4/master/pr/pr-hdds-2305-c92ks/acceptance/docker-ozones3-ozones3-s3-scm.log} > datanode_2 | 2019-10-15 03:18:06,953 INFO impl.RoleInfo: > 749b19c7-0772-44d2-8efd-0664e6aa0748: start FollowerState > datanode_2 | 2019-10-15 03:18:06,953 INFO impl.FollowerState: > 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13-FollowerState was > interrupted: java.lang.InterruptedException: sleep interrupted > datanode_2 | 2019-10-15 03:18:07,090 INFO impl.RaftServerImpl: > 749b19c7-0772-44d2-8efd-0664e6aa0748@group-D5F35BC43A13: changes role from > FOLLOWER to FOLLOWER at term 1464 for > recognizeCandidate:5ce55bf6-dbcc-40fb-8fb4-6e78032f4b8c > datanode_2 | 2019-10-15 03:18:07,090 INFO impl.RoleInfo: > 749b19c7-0772-44d2-8efd-0664e6aa0748: shutdown FollowerState > {noformat} > The goal of this change is to configure factor of THREE, to allow acceptance > test to catch such issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1228) Chunk Scanner Checkpoints
[ https://issues.apache.org/jira/browse/HDDS-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-1228: Labels: (was: pull-request-available) > Chunk Scanner Checkpoints > - > > Key: HDDS-1228 > URL: https://issues.apache.org/jira/browse/HDDS-1228 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Reporter: Supratim Deka >Assignee: Attila Doroszlai >Priority: Critical > Fix For: 0.5.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Checkpoint the progress of the chunk verification scanner. > Save the checkpoint persistently to support scanner resume from checkpoint - > after a datanode restart. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org