[jira] [Updated] (HDFS-14915) Move Superuser Check Before Taking Lock For Encryption API
[ https://issues.apache.org/jira/browse/HDFS-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14915: Attachment: HDFS-14915-01.patch > Move Superuser Check Before Taking Lock For Encryption API > -- > > Key: HDFS-14915 > URL: https://issues.apache.org/jira/browse/HDFS-14915 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14915-01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14915) Move Superuser Check Before Taking Lock For Encryption API
[ https://issues.apache.org/jira/browse/HDFS-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14915: Status: Patch Available (was: Open) > Move Superuser Check Before Taking Lock For Encryption API > -- > > Key: HDFS-14915 > URL: https://issues.apache.org/jira/browse/HDFS-14915 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14915-01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14915) Move Superuser Check Before Taking Lock For Encryption API
Ayush Saxena created HDFS-14915: --- Summary: Move Superuser Check Before Taking Lock For Encryption API Key: HDFS-14915 URL: https://issues.apache.org/jira/browse/HDFS-14915 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ayush Saxena Assignee: Ayush Saxena -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
[ https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955757#comment-16955757 ] Lokesh Jain commented on HDDS-2332: --- [~cxorm] It is difficult to reproduce the issue. I saw it in one of the runs. It is happening because of RATIS-718. Once it is fixed it should not appear in the runs. But we might need to support request timeouts in ozone as well. > BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future > --- > > Key: HDDS-2332 > URL: https://issues.apache.org/jira/browse/HDDS-2332 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Lokesh Jain >Priority: Major > > BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that > the thread is blocked on the same condition. > {code:java} > 2019-10-18 06:30:38 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > - locked <0xa6a75930> (a > org.apache.hadoop.fs.FSDataOutputStream) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77) > - locked <0xa6a75918> (a > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > 2019-10-18 07:02:50 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) >
[jira] [Commented] (HDDS-2328) Support large-scale listing
[ https://issues.apache.org/jira/browse/HDDS-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955756#comment-16955756 ] Lokesh Jain commented on HDDS-2328: --- Currently we do not implement FileSystem#listLocatedStatus api in Ozone. Therefore it ends up calling listStatus for the entire directory at once which can lead to OOM. I think we just need to have an implementation for listLocatedStatus and other such related apis in BasicOzoneFileSystem. > Support large-scale listing > > > Key: HDDS-2328 > URL: https://issues.apache.org/jira/browse/HDDS-2328 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Rajesh Balamohan >Assignee: Hanisha Koneru >Priority: Major > Labels: performance > > Large-scale listing of directory contents takes a lot longer time and also > has the potential to run into OOM. I have > 1 million entries in the same > level and it took lot longer time with {{RemoteIterator}} (didn't complete as > it was stuck in RDB::seek). > S3A batches it with 5K listing per fetch IIRC. It would be good to have this > feature in ozone as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2206) Separate handling for OMException and IOException in the Ozone Manager
[ https://issues.apache.org/jira/browse/HDDS-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2206: - Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Separate handling for OMException and IOException in the Ozone Manager > -- > > Key: HDDS-2206 > URL: https://issues.apache.org/jira/browse/HDDS-2206 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > As part of improving error propagation from the OM for ease of > troubleshooting and diagnosis, the proposal is to handle IOExceptions > separately from the business exceptions which are thrown as OMExceptions. > Handling for OMExceptions will not be changed in this jira. > Handling for IOExceptions will include logging the stacktrace on the server, > and propagation to the client under the control of a config parameter. > Similar handling is also proposed for SCMException. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2206) Separate handling for OMException and IOException in the Ozone Manager
[ https://issues.apache.org/jira/browse/HDDS-2206?focusedWorklogId=331192=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331192 ] ASF GitHub Bot logged work on HDDS-2206: Author: ASF GitHub Bot Created on: 21/Oct/19 05:00 Start Date: 21/Oct/19 05:00 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #12: HDDS-2206. Separate handling for OMException and IOException in the Ozone Manager. Contributed by Supratim Deka URL: https://github.com/apache/hadoop-ozone/pull/12 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331192) Time Spent: 2h 10m (was: 2h) > Separate handling for OMException and IOException in the Ozone Manager > -- > > Key: HDDS-2206 > URL: https://issues.apache.org/jira/browse/HDDS-2206 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > As part of improving error propagation from the OM for ease of > troubleshooting and diagnosis, the proposal is to handle IOExceptions > separately from the business exceptions which are thrown as OMExceptions. > Handling for OMExceptions will not be changed in this jira. > Handling for IOExceptions will include logging the stacktrace on the server, > and propagation to the client under the control of a config parameter. > Similar handling is also proposed for SCMException. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2326) Http server of Freon is not started for new Freon tests
[ https://issues.apache.org/jira/browse/HDDS-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2326. -- Fix Version/s: 0.5.0 Resolution: Fixed > Http server of Freon is not started for new Freon tests > --- > > Key: HDDS-2326 > URL: https://issues.apache.org/jira/browse/HDDS-2326 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: freon >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > HDDS-2022 introduced new Freon tests but the Freon http server is not started > for the new tests. > Freon includes a http server which can be turned on with the '–server' flag. > It helps to monitor and profile the freon as the http server contains by > default the prometheus and profiler servlets. > The server should be started if's requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2326) Http server of Freon is not started for new Freon tests
[ https://issues.apache.org/jira/browse/HDDS-2326?focusedWorklogId=331190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331190 ] ASF GitHub Bot logged work on HDDS-2326: Author: ASF GitHub Bot Created on: 21/Oct/19 04:46 Start Date: 21/Oct/19 04:46 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #52: HDDS-2326. Http server of Freon is not started for new Freon tests URL: https://github.com/apache/hadoop-ozone/pull/52 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331190) Time Spent: 20m (was: 10m) > Http server of Freon is not started for new Freon tests > --- > > Key: HDDS-2326 > URL: https://issues.apache.org/jira/browse/HDDS-2326 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: freon >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > HDDS-2022 introduced new Freon tests but the Freon http server is not started > for the new tests. > Freon includes a http server which can be turned on with the '–server' flag. > It helps to monitor and profile the freon as the http server contains by > default the prometheus and profiler servlets. > The server should be started if's requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2335) Params not included in AuditMessage
[ https://issues.apache.org/jira/browse/HDDS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-2335: Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the contribution [~adoroszlai] and [~bharat] and [~dineshchitlangia] for the reviews. I have committed this. > Params not included in AuditMessage > --- > > Key: HDDS-2335 > URL: https://issues.apache.org/jira/browse/HDDS-2335 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > HDDS-2323 introduced the following Findbugs violation: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt} > M P UrF: Unread field: > org.apache.hadoop.ozone.audit.AuditMessage$Builder.params At > AuditMessage.java:[line 106] > {noformat} > Which reveals that {{params}} is now not logged in audit messages: > {noformat} > 2019-10-20 08:41:35,248 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_VOLUME | ret=SUCCESS | > 2019-10-20 08:41:35,312 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_BUCKET | ret=SUCCESS | > 2019-10-20 08:41:35,407 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=ALLOCATE_KEY | ret=SUCCESS | > 2019-10-20 08:41:37,355 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=COMMIT_KEY | ret=SUCCESS | > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2335) Params not included in AuditMessage
[ https://issues.apache.org/jira/browse/HDDS-2335?focusedWorklogId=331187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331187 ] ASF GitHub Bot logged work on HDDS-2335: Author: ASF GitHub Bot Created on: 21/Oct/19 04:19 Start Date: 21/Oct/19 04:19 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #62: HDDS-2335. Params not included in AuditMessage URL: https://github.com/apache/hadoop-ozone/pull/62 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331187) Time Spent: 20m (was: 10m) > Params not included in AuditMessage > --- > > Key: HDDS-2335 > URL: https://issues.apache.org/jira/browse/HDDS-2335 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > HDDS-2323 introduced the following Findbugs violation: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt} > M P UrF: Unread field: > org.apache.hadoop.ozone.audit.AuditMessage$Builder.params At > AuditMessage.java:[line 106] > {noformat} > Which reveals that {{params}} is now not logged in audit messages: > {noformat} > 2019-10-20 08:41:35,248 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_VOLUME | ret=SUCCESS | > 2019-10-20 08:41:35,312 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_BUCKET | ret=SUCCESS | > 2019-10-20 08:41:35,407 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=ALLOCATE_KEY | ret=SUCCESS | > 2019-10-20 08:41:37,355 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=COMMIT_KEY | ret=SUCCESS | > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2337) Fix checkstyle errors
[ https://issues.apache.org/jira/browse/HDDS-2337?focusedWorklogId=331179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331179 ] ASF GitHub Bot logged work on HDDS-2337: Author: ASF GitHub Bot Created on: 21/Oct/19 03:57 Start Date: 21/Oct/19 03:57 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #64: HDDS-2337. Fix checkstyle errors URL: https://github.com/apache/hadoop-ozone/pull/64 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331179) Time Spent: 20m (was: 10m) > Fix checkstyle errors > - > > Key: HDDS-2337 > URL: https://issues.apache.org/jira/browse/HDDS-2337 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Checkstyle errors intoduced in HDDS-2281: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt} > hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java > 465: Line is longer than 80 characters (found 81). > hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java > 244: Line is longer than 80 characters (found 84). > hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java > 30: Unused import - > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException. > 506: ; is preceded with whitespace. > 517: ; is preceded with whitespace. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2337) Fix checkstyle errors
[ https://issues.apache.org/jira/browse/HDDS-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-2337: Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the contribution [~adoroszlai]. I have committed this. > Fix checkstyle errors > - > > Key: HDDS-2337 > URL: https://issues.apache.org/jira/browse/HDDS-2337 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Checkstyle errors intoduced in HDDS-2281: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt} > hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java > 465: Line is longer than 80 characters (found 81). > hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java > 244: Line is longer than 80 characters (found 84). > hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java > 30: Unused import - > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException. > 506: ; is preceded with whitespace. > 517: ; is preceded with whitespace. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14308) DFSStripedInputStream curStripeBuf is not freed by unbuffer()
[ https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955706#comment-16955706 ] Zhao Yi Ming commented on HDFS-14308: - I found a easy way to reproduce this issue through hbase bulkload. And also found the root cause. Now we are do the testing. After everything going well. I will update this Jira. > DFSStripedInputStream curStripeBuf is not freed by unbuffer() > - > > Key: HDFS-14308 > URL: https://issues.apache.org/jira/browse/HDFS-14308 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.0.0 >Reporter: Joe McDonnell >Assignee: Zhao Yi Ming >Priority: Major > Attachments: ec_heap_dump.png > > > Some users of HDFS cache opened HDFS file handles to avoid repeated > roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file > handles by default. Recent tests on erasure coded files show that the open > file handles can consume a large amount of memory when not in use. > For example, here is output from Impala's JMX endpoint when 608 file handles > are cached > {noformat} > { > "name": "java.nio:type=BufferPool,name=direct", > "modelerType": "sun.management.ManagementFactoryHelper$1", > "Name": "direct", > "TotalCapacity": 1921048960, > "MemoryUsed": 1921048961, > "Count": 633, > "ObjectName": "java.nio:type=BufferPool,name=direct" > },{noformat} > This shows direct buffer memory usage of 3MB per DFSStripedInputStream. > Attached is output from Eclipse MAT showing that the direct buffers come from > DFSStripedInputStream objects. Both Impala and HBase call unbuffer() when a > file handle is being cached and potentially unused for significant chunks of > time, yet this shows that the memory remains in use. > To support caching file handles on erasure coded files, DFSStripedInputStream > should avoid holding buffers after the unbuffer() call. See HDFS-7694. > "unbuffer()" is intended to move an input stream to a lower memory state to > support these caching use cases. In particular, the curStripeBuf seems to be > allocated from the BUFFER_POOL on a resetCurStripeBuffer(true) call. It is > not freed until close(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
[ https://issues.apache.org/jira/browse/HDDS-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955674#comment-16955674 ] Mukul Kumar Singh commented on HDDS-2336: - Thanks for the contribution [~adoroszlai]. I have committed this. > Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions > > > Key: HDDS-2336 > URL: https://issues.apache.org/jira/browse/HDDS-2336 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in > HDDS-2283, is failing: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt} > testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer) > Time elapsed: 0.135 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<11> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
[ https://issues.apache.org/jira/browse/HDDS-2336?focusedWorklogId=331133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331133 ] ASF GitHub Bot logged work on HDDS-2336: Author: ASF GitHub Bot Created on: 21/Oct/19 00:58 Start Date: 21/Oct/19 00:58 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #63: HDDS-2336. Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions URL: https://github.com/apache/hadoop-ozone/pull/63 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331133) Time Spent: 20m (was: 10m) > Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions > > > Key: HDDS-2336 > URL: https://issues.apache.org/jira/browse/HDDS-2336 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in > HDDS-2283, is failing: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt} > testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer) > Time elapsed: 0.135 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<11> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
[ https://issues.apache.org/jira/browse/HDDS-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-2336: Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions > > > Key: HDDS-2336 > URL: https://issues.apache.org/jira/browse/HDDS-2336 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in > HDDS-2283, is failing: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt} > testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer) > Time elapsed: 0.135 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<11> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955660#comment-16955660 ] guojh commented on HDFS-14768: -- [~surendrasingh] Please review it. > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
[jira] [Commented] (HDDS-2328) Support large-scale listing
[ https://issues.apache.org/jira/browse/HDDS-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955650#comment-16955650 ] Rajesh Balamohan commented on HDDS-2328: Here is the small snippet of the code which was used large listing (directory I used had millions of entries, which was populated earlier). ozone src details: https://github.com/apache/hadoop-ozone (commit b4a1afd60e3a3c7319a1ffa97d5ace3a95ed26f6). {noformat} // Get path details ... ... long sTime = System.currentTimeMillis(); RemoteIterator rit = fs.listLocatedStatus(path); long count = 0 ; while(rit.hasNext()) { rit.next(); count++; } long eTime = System.currentTimeMillis(); ... ... {noformat} > Support large-scale listing > > > Key: HDDS-2328 > URL: https://issues.apache.org/jira/browse/HDDS-2328 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Rajesh Balamohan >Assignee: Hanisha Koneru >Priority: Major > Labels: performance > > Large-scale listing of directory contents takes a lot longer time and also > has the potential to run into OOM. I have > 1 million entries in the same > level and it took lot longer time with {{RemoteIterator}} (didn't complete as > it was stuck in RDB::seek). > S3A batches it with 5K listing per fetch IIRC. It would be good to have this > feature in ozone as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2324) Enhance locking mechanism in OzoneMangaer
[ https://issues.apache.org/jira/browse/HDDS-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955649#comment-16955649 ] Rajesh Balamohan commented on HDDS-2324: [~arp], fairness is disabled by default. https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/lock/OzoneManagerLock.java#L95 https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java#L457 > Enhance locking mechanism in OzoneMangaer > - > > Key: HDDS-2324 > URL: https://issues.apache.org/jira/browse/HDDS-2324 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Rajesh Balamohan >Assignee: Nanda kumar >Priority: Major > Labels: performance > Attachments: om_lock_100_percent_read_benchmark.svg, > om_lock_reader_and_writer_workload.svg > > > OM has reentrant RW lock. With 100% read or 100% write benchmarks, it works > out reasonably fine. There is already a ticket to optimize the write codepath > (as it incurs reading from DB for key checks). > However, when small amount of write workload (e.g 3-5 threads) is added to > the running read benchmark, throughput suffers significantly. This is due to > the fact that the reader threads would get blocked often. I have observed > around 10x slower throughput (i.e 100% read benchmark was running at 12,000 > TPS and with couple of writer threads added to it, it goes down to 1200-1800 > TPS). > 1. Instead of single write lock, one option could be good to scale out the > write lock depending on the number of cores available in the system and > acquire relevant lock by hashing the key. > 2. Another option is to explore if we can make use of StampedLocks of JDK > 8.x, which scales well when multiple readers and writers are there. But it is > not a reentrant lock. So need to explore whether it can be an option or not. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2331) Client OOME due to buffer retention
[ https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955548#comment-16955548 ] Attila Doroszlai commented on HDDS-2331: Thanks for checking, [~szetszwo]. I probably should have said the bug is triggered by HDDS-2169, not caused by it. I agree, 16MB buffer is overkill for smaller keys. I tried to change it to match actual data length, but it's not trivial (causes other errors). > Client OOME due to buffer retention > --- > > Key: HDDS-2331 > URL: https://issues.apache.org/jira/browse/HDDS-2331 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Priority: Critical > Attachments: profiler.png > > > Freon random key generator exhausts default heap after just few hundred 1MB > keys. Heap dump on OOME reveals 150+ instances of > {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}. > Steps to reproduce: > # Start Ozone cluster with 1 datanode > # Start Freon (5K keys of size 1MB) > Result: OOME after a few hundred keys > {noformat} > $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone > $ docker-compose up -d > $ docker-compose exec scm bash > $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError' > $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 > --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 > --bufferSize 65536 > ... > java.lang.OutOfMemoryError: Java heap space > Dumping heap to java_pid289.hprof ... > Heap dump file created [1456141975 bytes in 7.760 secs] > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2334) Dummy chunk manager fails with length mismatch error
[ https://issues.apache.org/jira/browse/HDDS-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2334: --- Status: Patch Available (was: In Progress) > Dummy chunk manager fails with length mismatch error > > > Key: HDDS-2334 > URL: https://issues.apache.org/jira/browse/HDDS-2334 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HDDS-1094 added a config option ({{hdds.container.chunk.persistdata=false}}) > to drop chunks instead of writing them to disk. Currently this option > triggers the following error with any key size: > {noformat} > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > data array does not match the length specified. DataLen: 16777216 Byte > Array: 16777478 > at > org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDummyImpl.writeChunk(ChunkManagerDummyImpl.java:87) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:695) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:176) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:277) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:150) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:413) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:423) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:458) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2334) Dummy chunk manager fails with length mismatch error
[ https://issues.apache.org/jira/browse/HDDS-2334?focusedWorklogId=331081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331081 ] ASF GitHub Bot logged work on HDDS-2334: Author: ASF GitHub Bot Created on: 20/Oct/19 16:40 Start Date: 20/Oct/19 16:40 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #65: HDDS-2334. Dummy chunk manager fails with length mismatch error URL: https://github.com/apache/hadoop-ozone/pull/65 ## What changes were proposed in this pull request? Data size validation logic was recently [changed](https://github.com/apache/hadoop-ozone/commit/e70ea7b66ca3326c3b00ddc3e4af7144d48ea5f5#diff-92341865368a6b82a1430bcb40bd4264R83) for real `ChunkManager`, but not for the dummy implementation. This change extracts the validation logic and reuses it for the dummy one, too. This restores the ability to skip writing data to disk (for performance testing). https://issues.apache.org/jira/browse/HDDS-2334 ## How was this patch tested? Changed existing unit test to use a buffer with additional "header" at the beginning. Added test cases for dummy implementation. Tested on compose cluster with the following additional configs: ``` OZONE-SITE.XML_hdds.container.chunk.persistdata=false OZONE-SITE.XML_ozone.client.verify.checksum=false ``` ``` $ ozone sh volume create vol1 $ ozone sh bucket create vol1/buck1; $ ozone sh key put vol1/buck1/key1 /etc/passwd $ ozone sh key get vol1/buck1/key1 asdf $ ls -l /etc/passwd -rw-r--r-- 1 root root 671 Jun 17 15:33 /etc/passwd $ wc asdf 0 0 671 asdf ``` Also tested regular, "persistent" chunk manager: ``` $ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 --replicationType RATIS --factor ONE --validateWrites --keySize 1024 --numOfKeys 10 --bufferSize 1024 ... Status: Success Git Base Revision: e97acb3bd8f3befd27418996fa5d4b50bf2e17bf Number of Volumes created: 1 Number of Buckets created: 1 Number of Keys added: 10 Ratis replication factor: ONE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,182 Average Time spent in bucket creation: 00:00:00,030 Average Time spent in key creation: 00:00:00,290 Average Time spent in key write: 00:00:02,379 Total bytes written: 10240 Total number of writes validated: 10 Writes validated: 100.0 % Successful validation: 10 Unsuccessful validation: 0 Total Execution time: 00:00:09,389 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331081) Remaining Estimate: 0h Time Spent: 10m > Dummy chunk manager fails with length mismatch error > > > Key: HDDS-2334 > URL: https://issues.apache.org/jira/browse/HDDS-2334 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HDDS-1094 added a config option ({{hdds.container.chunk.persistdata=false}}) > to drop chunks instead of writing them to disk. Currently this option > triggers the following error with any key size: > {noformat} > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > data array does not match the length specified. DataLen: 16777216 Byte > Array: 16777478 > at > org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDummyImpl.writeChunk(ChunkManagerDummyImpl.java:87) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:695) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:176) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:277) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:150) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:413) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:423) > at >
[jira] [Updated] (HDDS-2334) Dummy chunk manager fails with length mismatch error
[ https://issues.apache.org/jira/browse/HDDS-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2334: - Labels: pull-request-available (was: ) > Dummy chunk manager fails with length mismatch error > > > Key: HDDS-2334 > URL: https://issues.apache.org/jira/browse/HDDS-2334 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > HDDS-1094 added a config option ({{hdds.container.chunk.persistdata=false}}) > to drop chunks instead of writing them to disk. Currently this option > triggers the following error with any key size: > {noformat} > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > data array does not match the length specified. DataLen: 16777216 Byte > Array: 16777478 > at > org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDummyImpl.writeChunk(ChunkManagerDummyImpl.java:87) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:695) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:176) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:277) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:150) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:413) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:423) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:458) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955517#comment-16955517 ] Hadoop QA commented on HDFS-14882: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 68 unchanged - 1 fixed = 68 total (was 69) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 44s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}145m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14882 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983544/HDFS-14882.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ceaec273d07f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 447f46d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28134/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28134/testReport/ | | Max. process+thread count | 4076 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | |
[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955509#comment-16955509 ] Hadoop QA commented on HDFS-14768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 58s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 170 unchanged - 1 fixed = 170 total (was 171) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}143m 42s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}218m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestQuotaByStorageType | | | hadoop.hdfs.server.namenode.TestAddStripedBlocks | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.server.namenode.TestUpgradeDomainBlockPlacementPolicy | | | hadoop.hdfs.server.namenode.TestListCorruptFileBlocks | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | hadoop.hdfs.server.namenode.TestCacheDirectives | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983541/HDFS-14768.007.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9dfa0cd5375e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 447f46d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Comment Edited] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955493#comment-16955493 ] Ayush Saxena edited comment on HDFS-14882 at 10/20/19 1:27 PM: --- Thanx [~hexiaoqiao] IMO we shouldn't use the same configuration, someone turning on the old configuration, will now after this will turn this feature on too, which doesn't use to happen earlier. In general for anything new we usually keep the new feature turned off by default and I see the default for the config is true. I don't think we should force people into using this by default, since sorting too has some performance impact, So I would prefer them turning this on explicitly. Though things are quiet similar but not for same thing, I think we should have a separate config. Moreover for the test, You may add a case having decommissioned or stale datanodes and verify the case that they stay at end irrespective of the distance. was (Author: ayushtkn): Thanx [~hexiaoqiao] IMO we shouldn't use the same configuration, someone turning on the old configuration, will now after this will turn this feature on too, which doesn't use to happen earlier. In general for anything new we usually keep the new feature turned off by default and I see the default for the config is true. I don't think we should force people into using this by default, since sorting too has some performance impact, So I would prefer them turning this on explicitly. Though things are quiet similar but not for same thing, I think we should have a seperate config. > Consider DataNode load when #getBlockLocation > - > > Key: HDFS-14882 > URL: https://issues.apache.org/jira/browse/HDFS-14882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, > HDFS-14882.003.patch > > > Currently, we consider load of datanode when #chooseTarget for writer, > however not consider it for reader. Thus, the process slot of datanode could > be occupied by #BlockSender for reader, and disk/network will be busy > workload, then meet some slow node exception. IIRC same case is reported > times. Based on the fact, I propose to consider load for reader same as it > did #chooseTarget for writer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955493#comment-16955493 ] Ayush Saxena commented on HDFS-14882: - Thanx [~hexiaoqiao] IMO we shouldn't use the same configuration, someone turning on the old configuration, will now after this will turn this feature on too, which doesn't use to happen earlier. In general for anything new we usually keep the new feature turned off by default and I see the default for the config is true. I don't think we should force people into using this by default, since sorting too has some performance impact, So I would prefer them turning this on explicitly. Though things are quiet similar but not for same thing, I think we should have a seperate config. > Consider DataNode load when #getBlockLocation > - > > Key: HDFS-14882 > URL: https://issues.apache.org/jira/browse/HDFS-14882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, > HDFS-14882.003.patch > > > Currently, we consider load of datanode when #chooseTarget for writer, > however not consider it for reader. Thus, the process slot of datanode could > be occupied by #BlockSender for reader, and disk/network will be busy > workload, then meet some slow node exception. IIRC same case is reported > times. Based on the fact, I propose to consider load for reader same as it > did #chooseTarget for writer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955487#comment-16955487 ] Xiaoqiao He commented on HDFS-14882: Thanks [~ayushtkn] for your reviews. I have addressed all the latest comments, I think. {quote}You even need to add the new config in Hdfs-defaults.xml {quote} I try to reused the config which has used by BlockPlacementPolicyDefault#isGoodDatanode. IMO, they both follow the same semantic, so I think we do not need to add another one. I would like to give a brief introduction for this changes. Generally, this patch try to re-order nodes who have same topology distance to client and based on Load descend order. Firstly, calculate all distances from nodes to client. Then, try to use #start and #end to embrace nodes with the same distances. Thirdly, re-sort by load descend order. (Note: Skip it when section length is less than 2 since it is unnecessary to sort one node.) > Consider DataNode load when #getBlockLocation > - > > Key: HDFS-14882 > URL: https://issues.apache.org/jira/browse/HDFS-14882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, > HDFS-14882.003.patch > > > Currently, we consider load of datanode when #chooseTarget for writer, > however not consider it for reader. Thus, the process slot of datanode could > be occupied by #BlockSender for reader, and disk/network will be busy > workload, then meet some slow node exception. IIRC same case is reported > times. Based on the fact, I propose to consider load for reader same as it > did #chooseTarget for writer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2319) CLI command to perform on-demand data scan of a specific container
[ https://issues.apache.org/jira/browse/HDDS-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YiSheng Lien reassigned HDDS-2319: -- Assignee: YiSheng Lien > CLI command to perform on-demand data scan of a specific container > -- > > Key: HDDS-2319 > URL: https://issues.apache.org/jira/browse/HDDS-2319 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone CLI >Reporter: Attila Doroszlai >Assignee: YiSheng Lien >Priority: Major > > On-demand data scan for a specific container might be a useful debug tool. > Thanks [~aengineer] for the idea. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
[ https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955482#comment-16955482 ] YiSheng Lien commented on HDDS-2332: Hi [~ljain], Would you show the condition to reproduce the issue ? > BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future > --- > > Key: HDDS-2332 > URL: https://issues.apache.org/jira/browse/HDDS-2332 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Lokesh Jain >Priority: Major > > BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that > the thread is blocked on the same condition. > {code:java} > 2019-10-18 06:30:38 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > - locked <0xa6a75930> (a > org.apache.hadoop.fs.FSDataOutputStream) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77) > - locked <0xa6a75918> (a > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > 2019-10-18 07:02:50 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at >
[jira] [Updated] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-14882: --- Attachment: HDFS-14882.003.patch > Consider DataNode load when #getBlockLocation > - > > Key: HDFS-14882 > URL: https://issues.apache.org/jira/browse/HDFS-14882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, > HDFS-14882.003.patch > > > Currently, we consider load of datanode when #chooseTarget for writer, > however not consider it for reader. Thus, the process slot of datanode could > be occupied by #BlockSender for reader, and disk/network will be busy > workload, then meet some slow node exception. IIRC same case is reported > times. Based on the fact, I propose to consider load for reader same as it > did #chooseTarget for writer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2337) Fix checkstyle errors
[ https://issues.apache.org/jira/browse/HDDS-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2337: --- Status: Patch Available (was: Open) > Fix checkstyle errors > - > > Key: HDDS-2337 > URL: https://issues.apache.org/jira/browse/HDDS-2337 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Checkstyle errors intoduced in HDDS-2281: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt} > hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java > 465: Line is longer than 80 characters (found 81). > hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java > 244: Line is longer than 80 characters (found 84). > hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java > 30: Unused import - > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException. > 506: ; is preceded with whitespace. > 517: ; is preceded with whitespace. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
[ https://issues.apache.org/jira/browse/HDDS-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2336: --- Status: Patch Available (was: In Progress) > Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions > > > Key: HDDS-2336 > URL: https://issues.apache.org/jira/browse/HDDS-2336 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in > HDDS-2283, is failing: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt} > testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer) > Time elapsed: 0.135 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<11> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2337) Fix checkstyle errors
[ https://issues.apache.org/jira/browse/HDDS-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2337: - Labels: pull-request-available (was: ) > Fix checkstyle errors > - > > Key: HDDS-2337 > URL: https://issues.apache.org/jira/browse/HDDS-2337 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > Checkstyle errors intoduced in HDDS-2281: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt} > hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java > 465: Line is longer than 80 characters (found 81). > hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java > 244: Line is longer than 80 characters (found 84). > hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java > 30: Unused import - > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException. > 506: ; is preceded with whitespace. > 517: ; is preceded with whitespace. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2337) Fix checkstyle errors
[ https://issues.apache.org/jira/browse/HDDS-2337?focusedWorklogId=331052=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331052 ] ASF GitHub Bot logged work on HDDS-2337: Author: ASF GitHub Bot Created on: 20/Oct/19 11:48 Start Date: 20/Oct/19 11:48 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #64: HDDS-2337. Fix checkstyle errors URL: https://github.com/apache/hadoop-ozone/pull/64 ## What changes were proposed in this pull request? Fix current [checkstyle errors](https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt). Also: * fix some log message placeholder vs. parameter count mismatch. * remove `NoSuchAlgorithmException` from javadoc where not declared to be thrown https://issues.apache.org/jira/browse/HDDS-2337 ## How was this patch tested? Ran checkstyle. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331052) Remaining Estimate: 0h Time Spent: 10m > Fix checkstyle errors > - > > Key: HDDS-2337 > URL: https://issues.apache.org/jira/browse/HDDS-2337 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Checkstyle errors intoduced in HDDS-2281: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt} > hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java > 465: Line is longer than 80 characters (found 81). > hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java > 244: Line is longer than 80 characters (found 84). > hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java > 30: Unused import - > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException. > 506: ; is preceded with whitespace. > 517: ; is preceded with whitespace. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2337) Fix checkstyle errors
Attila Doroszlai created HDDS-2337: -- Summary: Fix checkstyle errors Key: HDDS-2337 URL: https://issues.apache.org/jira/browse/HDDS-2337 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.5.0 Reporter: Attila Doroszlai Assignee: Attila Doroszlai Checkstyle errors intoduced in HDDS-2281: {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt} hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java 465: Line is longer than 80 characters (found 81). hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java 244: Line is longer than 80 characters (found 84). hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java 30: Unused import - org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException. 506: ; is preceded with whitespace. 517: ; is preceded with whitespace. {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
[ https://issues.apache.org/jira/browse/HDDS-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2336: - Labels: pull-request-available (was: ) > Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions > > > Key: HDDS-2336 > URL: https://issues.apache.org/jira/browse/HDDS-2336 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in > HDDS-2283, is failing: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt} > testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer) > Time elapsed: 0.135 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<11> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
[ https://issues.apache.org/jira/browse/HDDS-2336?focusedWorklogId=331051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331051 ] ASF GitHub Bot logged work on HDDS-2336: Author: ASF GitHub Bot Created on: 20/Oct/19 11:37 Start Date: 20/Oct/19 11:37 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #63: HDDS-2336. Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions URL: https://github.com/apache/hadoop-ozone/pull/63 ## What changes were proposed in this pull request? `testRocksDBCreateUsesCachedOptions` is [failing](https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt) because each test method in `TestKeyValueContainer` introduces a new entry in `MetadataStoreBuilder.CACHED_OPTS`, since `Configuration` does not implement `equals`. Thus `testRocksDBCreateUsesCachedOptions` passes by itself, but fails when the whole test class is run. https://issues.apache.org/jira/browse/HDDS-2336 ## How was this patch tested? Unit test. No other code changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331051) Remaining Estimate: 0h Time Spent: 10m > Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions > > > Key: HDDS-2336 > URL: https://issues.apache.org/jira/browse/HDDS-2336 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in > HDDS-2283, is failing: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt} > testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer) > Time elapsed: 0.135 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<11> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
[ https://issues.apache.org/jira/browse/HDDS-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2336 started by Attila Doroszlai. -- > Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions > > > Key: HDDS-2336 > URL: https://issues.apache.org/jira/browse/HDDS-2336 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > > TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in > HDDS-2283, is failing: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt} > testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer) > Time elapsed: 0.135 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<11> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
Attila Doroszlai created HDDS-2336: -- Summary: Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions Key: HDDS-2336 URL: https://issues.apache.org/jira/browse/HDDS-2336 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Affects Versions: 0.5.0 Reporter: Attila Doroszlai Assignee: Attila Doroszlai TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in HDDS-2283, is failing: {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt} testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer) Time elapsed: 0.135 s <<< FAILURE! java.lang.AssertionError: expected:<1> but was:<11> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2280) HddsUtils#CheckForException should not return null in case the ratis exception cause is not set
[ https://issues.apache.org/jira/browse/HDDS-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh resolved HDDS-2280. - Fix Version/s: 0.5.0 Resolution: Fixed Thanks for the contribution [~shashikant] and [~bharat] for the review. I have committed this. > HddsUtils#CheckForException should not return null in case the ratis > exception cause is not set > --- > > Key: HDDS-2280 > URL: https://issues.apache.org/jira/browse/HDDS-2280 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > HddsUtils#CheckForException checks for the cause to be set properly to one of > the defined/expected exceptions. In case, ratis throws up any runtime > exception, HddsUtils#CheckForException can return null and lead to > NullPointerException while write. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2280) HddsUtils#CheckForException should not return null in case the ratis exception cause is not set
[ https://issues.apache.org/jira/browse/HDDS-2280?focusedWorklogId=331050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331050 ] ASF GitHub Bot logged work on HDDS-2280: Author: ASF GitHub Bot Created on: 20/Oct/19 11:20 Start Date: 20/Oct/19 11:20 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #57: HDDS-2280. HddsUtils#CheckForException should not return null in case the ratis exception cause is not set URL: https://github.com/apache/hadoop-ozone/pull/57 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331050) Time Spent: 40m (was: 0.5h) > HddsUtils#CheckForException should not return null in case the ratis > exception cause is not set > --- > > Key: HDDS-2280 > URL: https://issues.apache.org/jira/browse/HDDS-2280 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > HddsUtils#CheckForException checks for the cause to be set properly to one of > the defined/expected exceptions. In case, ratis throws up any runtime > exception, HddsUtils#CheckForException can return null and lead to > NullPointerException while write. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2281) ContainerStateMachine#handleWriteChunk should ignore close container exception
[ https://issues.apache.org/jira/browse/HDDS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh resolved HDDS-2281. - Resolution: Fixed Thanks for the contribution [~shashikant]. I have committed this. > ContainerStateMachine#handleWriteChunk should ignore close container > exception > --- > > Key: HDDS-2281 > URL: https://issues.apache.org/jira/browse/HDDS-2281 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, ContainerStateMachine#applyTrannsaction ignores close container > exception.Similarly,ContainerStateMachine#handleWriteChunk call also should > ignore close container exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2281) ContainerStateMachine#handleWriteChunk should ignore close container exception
[ https://issues.apache.org/jira/browse/HDDS-2281?focusedWorklogId=331049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331049 ] ASF GitHub Bot logged work on HDDS-2281: Author: ASF GitHub Bot Created on: 20/Oct/19 11:16 Start Date: 20/Oct/19 11:16 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #54: HDDS-2281. ContainerStateMachine#handleWriteChunk should ignore close container exception URL: https://github.com/apache/hadoop-ozone/pull/54 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331049) Time Spent: 1h 20m (was: 1h 10m) > ContainerStateMachine#handleWriteChunk should ignore close container > exception > --- > > Key: HDDS-2281 > URL: https://issues.apache.org/jira/browse/HDDS-2281 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, ContainerStateMachine#applyTrannsaction ignores close container > exception.Similarly,ContainerStateMachine#handleWriteChunk call also should > ignore close container exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13736) BlockPlacementPolicyDefault can not choose favored nodes when 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false
[ https://issues.apache.org/jira/browse/HDFS-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955449#comment-16955449 ] Ayush Saxena commented on HDFS-13736: - Thanx [~xiaodong.hu] for the patch. I think the introduced test failed, You need to check it once. bq. If just add a parameter to chooseLocalStorage to denote it, I think lots of places should be modified. I tried the approach, I don't think there are two places to tweek and in the end it landed up quite a less number of lines than the present patch. I think adding parameter would be a cleaner approach. If you are using an IDE, you can use the refactor option to add new param to the method, it shall automatically update all the places using it, with the default value passed. Let me know for any help, if you are facing any trouble.. > BlockPlacementPolicyDefault can not choose favored nodes when > 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false > -- > > Key: HDFS-13736 > URL: https://issues.apache.org/jira/browse/HDFS-13736 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: hu xiaodong >Assignee: hu xiaodong >Priority: Major > Attachments: HDFS-13736.001.patch, HDFS-13736.002.patch > > > BlockPlacementPolicyDefault can not choose favored nodes when > 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955446#comment-16955446 ] guojh commented on HDFS-14768: -- rebase > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); >
[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh updated HDFS-14768: - Attachment: HDFS-14768.007.patch > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); >
[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955444#comment-16955444 ] Hadoop QA commented on HDFS-14882: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 68 unchanged - 1 fixed = 70 total (was 69) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 1s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}167m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestSaveNamespace | | | hadoop.hdfs.tools.TestDFSZKFailoverController | | | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14882 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12983540/HDFS-14882.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7a17ddad9fba 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 447f46d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/28132/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28132/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Commented] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java
[ https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955439#comment-16955439 ] Ayush Saxena commented on HDFS-14860: - Thanx [~belugabehr] for the patch. Overall Looks good. Minor doubt with this change : {code:java} 233 private void clearPathIds() { 234 final Collection paths = new ArrayList<>(); 235 pathsToBeTraveresed.drainTo(paths); 236 for (Long trackId : paths) { 237 try { 238 namesystem.removeXattr(trackId, 239 HdfsServerConstants.XATTR_SATISFY_STORAGE_POLICY); 240 } catch (IOException e) { 241 LOG.debug("Failed to remove sps xatttr!", e); 258 } 242 } {code} Can't we do this. instead creating new arrayList : {code:java} private void clearPathIds() { while (!pathsToBeTraveresed.isEmpty()) try { namesystem.removeXattr(pathsToBeTraveresed.remove(), HdfsServerConstants.XATTR_SATISFY_STORAGE_POLICY); } catch (IOException e) { LOG.debug("Failed to remove sps xatttr!", e); } } {code} > Clean Up StoragePolicySatisfyManager.java > - > > Key: HDFS-14860 > URL: https://issues.apache.org/jira/browse/HDFS-14860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch, > HDFS-14860.3.patch > > > * Remove superfluous debug log guards > * Use {{java.util.concurrent}} package for internal structure instead of > external synchronization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955438#comment-16955438 ] Surendra Singh Lilhore commented on HDFS-14768: --- [~gjhkael], again re-base is required. HDFS-14847 committed to trunk. > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.jpg, guojh_UT_after_deomission.txt, > guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, > zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); >
[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955433#comment-16955433 ] Ayush Saxena commented on HDFS-14882: - Thanx [~hexiaoqiao] for the patch. The overall idea looks good. One doubt in the following logic, if you can help me understand : {code:java} for (int start = 0, end = 0; start < activeLen & end < activeLen;) { if (distances[start] == distances[end]) { end = end + 1; if (end < activeLen) continue; } Arrays.sort(datanodes, start, end, Comparator.comparingInt(DatanodeInfo::getXceiverCount)); start = end; end = end + 1; } }{code} * The first iteration would be start=0 and end=0; the {{ if (distances[start] == distances[end])}} will always be true, why I don't start with end as 1? * now at second iteration; start=0 and end=1; distance[0]!= distance[1], condition tends to be false; then why do I execute {{ Arrays.sort(datanodes, start, end, Comparator.comparingInt(DatanodeInfo::getXceiverCount));}} * Third iter; start = 1; end= 2; if distance isn't equal again we do {{ Arrays.sort(datanodes, start, end, Comparator.comparingInt(DatanodeInfo::getXceiverCount));}} Well I need to recheck this logic You even need to add the new config in {{Hdfs-defaults.xml}} Apart almost LGTM > Consider DataNode load when #getBlockLocation > - > > Key: HDFS-14882 > URL: https://issues.apache.org/jira/browse/HDFS-14882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch > > > Currently, we consider load of datanode when #chooseTarget for writer, > however not consider it for reader. Thus, the process slot of datanode could > be occupied by #BlockSender for reader, and disk/network will be busy > workload, then meet some slow node exception. IIRC same case is reported > times. Based on the fact, I propose to consider load for reader same as it > did #chooseTarget for writer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2335) Params not included in AuditMessage
[ https://issues.apache.org/jira/browse/HDDS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2335: --- Status: Patch Available (was: In Progress) > Params not included in AuditMessage > --- > > Key: HDDS-2335 > URL: https://issues.apache.org/jira/browse/HDDS-2335 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HDDS-2323 introduced the following Findbugs violation: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt} > M P UrF: Unread field: > org.apache.hadoop.ozone.audit.AuditMessage$Builder.params At > AuditMessage.java:[line 106] > {noformat} > Which reveals that {{params}} is now not logged in audit messages: > {noformat} > 2019-10-20 08:41:35,248 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_VOLUME | ret=SUCCESS | > 2019-10-20 08:41:35,312 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_BUCKET | ret=SUCCESS | > 2019-10-20 08:41:35,407 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=ALLOCATE_KEY | ret=SUCCESS | > 2019-10-20 08:41:37,355 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=COMMIT_KEY | ret=SUCCESS | > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2335) Params not included in AuditMessage
[ https://issues.apache.org/jira/browse/HDDS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2335: - Labels: pull-request-available (was: ) > Params not included in AuditMessage > --- > > Key: HDDS-2335 > URL: https://issues.apache.org/jira/browse/HDDS-2335 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > HDDS-2323 introduced the following Findbugs violation: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt} > M P UrF: Unread field: > org.apache.hadoop.ozone.audit.AuditMessage$Builder.params At > AuditMessage.java:[line 106] > {noformat} > Which reveals that {{params}} is now not logged in audit messages: > {noformat} > 2019-10-20 08:41:35,248 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_VOLUME | ret=SUCCESS | > 2019-10-20 08:41:35,312 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_BUCKET | ret=SUCCESS | > 2019-10-20 08:41:35,407 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=ALLOCATE_KEY | ret=SUCCESS | > 2019-10-20 08:41:37,355 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=COMMIT_KEY | ret=SUCCESS | > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2335) Params not included in AuditMessage
[ https://issues.apache.org/jira/browse/HDDS-2335?focusedWorklogId=331040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331040 ] ASF GitHub Bot logged work on HDDS-2335: Author: ASF GitHub Bot Created on: 20/Oct/19 09:05 Start Date: 20/Oct/19 09:05 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #62: HDDS-2335. Params not included in AuditMessage URL: https://github.com/apache/hadoop-ozone/pull/62 ## What changes were proposed in this pull request? Include operation parameters in audit messages like before [HDDS-2323](https://github.com/apache/hadoop-ozone/commit/b9618834c9902fc8fd9ae12872092cfb1e5c1be3). https://issues.apache.org/jira/browse/HDDS-2335 ## How was this patch tested? Added unit test. Verified Findbugs violation is fixed. Tested using Freon: ``` 2019-10-20 08:54:19,437 | INFO | OMAudit | user=hadoop | ip=192.168.144.3 | op=CREATE_VOLUME {admin=hadoop, owner=hadoop, volume=vol-0-52867, creationTime=1571561659397, quotaInBytes=1152921504606846976, objectID=1, updateID=1} | ret=SUCCESS | 2019-10-20 08:54:19,497 | INFO | OMAudit | user=hadoop | ip=192.168.144.3 | op=CREATE_BUCKET {volume=vol-0-52867, bucket=bucket-0-43473, gdprEnabled=null, acls=[user:hadoop:a[ACCESS], group:users:a[ACCESS]], isVersionEnabled=false, storageType=DISK, creationTime=1571561659483} | ret=SUCCESS | ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331040) Remaining Estimate: 0h Time Spent: 10m > Params not included in AuditMessage > --- > > Key: HDDS-2335 > URL: https://issues.apache.org/jira/browse/HDDS-2335 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HDDS-2323 introduced the following Findbugs violation: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt} > M P UrF: Unread field: > org.apache.hadoop.ozone.audit.AuditMessage$Builder.params At > AuditMessage.java:[line 106] > {noformat} > Which reveals that {{params}} is now not logged in audit messages: > {noformat} > 2019-10-20 08:41:35,248 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_VOLUME | ret=SUCCESS | > 2019-10-20 08:41:35,312 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_BUCKET | ret=SUCCESS | > 2019-10-20 08:41:35,407 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=ALLOCATE_KEY | ret=SUCCESS | > 2019-10-20 08:41:37,355 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=COMMIT_KEY | ret=SUCCESS | > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica
[ https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955428#comment-16955428 ] Ayush Saxena commented on HDFS-14283: - Thanx [~leosun08] for the patch. Had a quick look on the idea, Couldn't check the whole code, Some concerns : * I think the feature to prefer cached Replica should be optional and governed by a config at the client side, whether he wants it or not. * Secondly, The changes have moved to the server side too, for the sorting stuff. I think this would have performance impact for those who even don't want to prefer the cached locations. The intent with which this Jira started was to keep the logic down at client side, So I think we should refrain from changes at server side. * Even make sure those not interested of using cached Replica, should not get affected by any means all the process work for this should be only done, if this feature is turned, which by default should be turned off. > DFSInputStream to prefer cached replica > --- > > Key: HDFS-14283 > URL: https://issues.apache.org/jira/browse/HDFS-14283 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 > Environment: HDFS Caching >Reporter: Wei-Chiu Chuang >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, > HDFS-14283.003.patch > > > HDFS Caching offers performance benefits. However, currently NameNode does > not treat cached replica with higher priority, so HDFS caching is only useful > when cache replication = 3, that is to say, all replicas are cached in > memory, so that a client doesn't randomly pick an uncached replica. > HDFS-6846 proposed to let NameNode give higher priority to cached replica. > Changing a logic in NameNode is always tricky so that didn't get much > traction. Here I propose a different approach: let client (DFSInputStream) > prefer cached replica. > A {{LocatedBlock}} object already contains cached replica location so a > client has the needed information. I think we can change > {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica
[ https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955426#comment-16955426 ] Xiaoqiao He commented on HDFS-14283: Thanks [~leosun08] for your works. {quote}But i have a problem that current block.getLocations() which gets a list of DataNodes in priority order does not consider choosed DN LOAD, bandwidth etc. I think it is necessary to add this logic later.{quote} HDFS-14882 is working now, it is very pleasure if you are interested to review it? For this ticket, I am concerned about which one should be given priority between distance and cached. Or leave the option to user? Consider the following case, 3 replicas (names ra, rb, rc respectively) of one block, and set cache replicas number to 2 which combine with rb and rc. then another client which topology distance is more near to host which ra located at (one corner case is that the client from the same host which ra located at) rather than hosts rb/rc located. Then which one host should the client request to firstly. I believe both ra or rb/rc is reasonable. [^HDFS-14283.003.patch] seems to choose cache priority policy, right? I just suggest maybe it is better to leave the choice to user. > DFSInputStream to prefer cached replica > --- > > Key: HDFS-14283 > URL: https://issues.apache.org/jira/browse/HDFS-14283 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 > Environment: HDFS Caching >Reporter: Wei-Chiu Chuang >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, > HDFS-14283.003.patch > > > HDFS Caching offers performance benefits. However, currently NameNode does > not treat cached replica with higher priority, so HDFS caching is only useful > when cache replication = 3, that is to say, all replicas are cached in > memory, so that a client doesn't randomly pick an uncached replica. > HDFS-6846 proposed to let NameNode give higher priority to cached replica. > Changing a logic in NameNode is always tricky so that didn't get much > traction. Here I propose a different approach: let client (DFSInputStream) > prefer cached replica. > A {{LocatedBlock}} object already contains cached replica location so a > client has the needed information. I think we can change > {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2334) Dummy chunk manager fails with length mismatch error
[ https://issues.apache.org/jira/browse/HDDS-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2334 started by Attila Doroszlai. -- > Dummy chunk manager fails with length mismatch error > > > Key: HDDS-2334 > URL: https://issues.apache.org/jira/browse/HDDS-2334 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > > HDDS-1094 added a config option ({{hdds.container.chunk.persistdata=false}}) > to drop chunks instead of writing them to disk. Currently this option > triggers the following error with any key size: > {noformat} > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > data array does not match the length specified. DataLen: 16777216 Byte > Array: 16777478 > at > org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDummyImpl.writeChunk(ChunkManagerDummyImpl.java:87) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:695) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:176) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:277) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:150) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:413) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:423) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:458) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2335) Params not included in AuditMessage
[ https://issues.apache.org/jira/browse/HDDS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2335 started by Attila Doroszlai. -- > Params not included in AuditMessage > --- > > Key: HDDS-2335 > URL: https://issues.apache.org/jira/browse/HDDS-2335 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > > HDDS-2323 introduced the following Findbugs violation: > {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt} > M P UrF: Unread field: > org.apache.hadoop.ozone.audit.AuditMessage$Builder.params At > AuditMessage.java:[line 106] > {noformat} > Which reveals that {{params}} is now not logged in audit messages: > {noformat} > 2019-10-20 08:41:35,248 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_VOLUME | ret=SUCCESS | > 2019-10-20 08:41:35,312 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=CREATE_BUCKET | ret=SUCCESS | > 2019-10-20 08:41:35,407 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=ALLOCATE_KEY | ret=SUCCESS | > 2019-10-20 08:41:37,355 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | > op=COMMIT_KEY | ret=SUCCESS | > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2335) Params not included in AuditMessage
Attila Doroszlai created HDDS-2335: -- Summary: Params not included in AuditMessage Key: HDDS-2335 URL: https://issues.apache.org/jira/browse/HDDS-2335 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.5.0 Reporter: Attila Doroszlai Assignee: Attila Doroszlai HDDS-2323 introduced the following Findbugs violation: {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt} M P UrF: Unread field: org.apache.hadoop.ozone.audit.AuditMessage$Builder.params At AuditMessage.java:[line 106] {noformat} Which reveals that {{params}} is now not logged in audit messages: {noformat} 2019-10-20 08:41:35,248 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | op=CREATE_VOLUME | ret=SUCCESS | 2019-10-20 08:41:35,312 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | op=CREATE_BUCKET | ret=SUCCESS | 2019-10-20 08:41:35,407 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | op=ALLOCATE_KEY | ret=SUCCESS | 2019-10-20 08:41:37,355 | INFO | OMAudit | user=hadoop | ip=192.168.128.2 | op=COMMIT_KEY | ret=SUCCESS | {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2334) Dummy chunk manager fails with length mismatch error
Attila Doroszlai created HDDS-2334: -- Summary: Dummy chunk manager fails with length mismatch error Key: HDDS-2334 URL: https://issues.apache.org/jira/browse/HDDS-2334 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Attila Doroszlai Assignee: Attila Doroszlai HDDS-1094 added a config option ({{hdds.container.chunk.persistdata=false}}) to drop chunks instead of writing them to disk. Currently this option triggers the following error with any key size: {noformat} org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: data array does not match the length specified. DataLen: 16777216 Byte Array: 16777478 at org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDummyImpl.writeChunk(ChunkManagerDummyImpl.java:87) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:695) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:176) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:277) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:150) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:413) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:423) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:458) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955423#comment-16955423 ] Xiaoqiao He commented on HDFS-14882: [^HDFS-14882.002.patch] fix checkstyle and try to trigger Jenkins again. Hi watchers, anyone would like to help take another review? > Consider DataNode load when #getBlockLocation > - > > Key: HDFS-14882 > URL: https://issues.apache.org/jira/browse/HDFS-14882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch > > > Currently, we consider load of datanode when #chooseTarget for writer, > however not consider it for reader. Thus, the process slot of datanode could > be occupied by #BlockSender for reader, and disk/network will be busy > workload, then meet some slow node exception. IIRC same case is reported > times. Based on the fact, I propose to consider load for reader same as it > did #chooseTarget for writer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14882) Consider DataNode load when #getBlockLocation
[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-14882: --- Attachment: HDFS-14882.002.patch > Consider DataNode load when #getBlockLocation > - > > Key: HDFS-14882 > URL: https://issues.apache.org/jira/browse/HDFS-14882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch > > > Currently, we consider load of datanode when #chooseTarget for writer, > however not consider it for reader. Thus, the process slot of datanode could > be occupied by #BlockSender for reader, and disk/network will be busy > workload, then meet some slow node exception. IIRC same case is reported > times. Based on the fact, I propose to consider load for reader same as it > did #chooseTarget for writer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.
[ https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh resolved HDDS-2283. - Resolution: Fixed > Container creation on datanodes take time because of Rocksdb option creation. > - > > Key: HDDS-2283 > URL: https://issues.apache.org/jira/browse/HDDS-2283 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-2283.00.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Container Creation on datanodes take around 300ms due to rocksdb creation. > Rocksdb creation is taking a considerable time and this needs to be optimized. > Creating a rocksdb per disk should be enough and each container can be table > inside the rocksdb. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2311) Fix logic of RetryPolicy in OzoneClientSideTranslatorPB
[ https://issues.apache.org/jira/browse/HDDS-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh reassigned HDDS-2311: --- Assignee: Hanisha Koneru > Fix logic of RetryPolicy in OzoneClientSideTranslatorPB > --- > > Key: HDDS-2311 > URL: https://issues.apache.org/jira/browse/HDDS-2311 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Hanisha Koneru >Priority: Blocker > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > OzoneManagerProtocolClientSideTranslatorPB.java > L251: if (cause instanceof NotLeaderException) { > NotLeaderException notLeaderException = (NotLeaderException) cause; > omFailoverProxyProvider.performFailoverIfRequired( > notLeaderException.getSuggestedLeaderNodeId()); > return getRetryAction(RetryAction.RETRY, retries, failovers); > } > > The suggested leader returned from Server is not used during failOver, as the > cause is a type of RemoteException. So with current code, it does not use > suggested leader for failOver at all and by default with each OM, it tries > max retries. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.
[ https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955410#comment-16955410 ] Mukul Kumar Singh commented on HDDS-2283: - Thanks for the contribution [~swagle] and [~avijayan] and [~aengineer] for the review. I have committed this. > Container creation on datanodes take time because of Rocksdb option creation. > - > > Key: HDDS-2283 > URL: https://issues.apache.org/jira/browse/HDDS-2283 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-2283.00.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Container Creation on datanodes take around 300ms due to rocksdb creation. > Rocksdb creation is taking a considerable time and this needs to be optimized. > Creating a rocksdb per disk should be enough and each container can be table > inside the rocksdb. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.
[ https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-2283: Summary: Container creation on datanodes take time because of Rocksdb option creation. (was: Container Creation on datanodes take around 300ms due to rocksdb creation) > Container creation on datanodes take time because of Rocksdb option creation. > - > > Key: HDDS-2283 > URL: https://issues.apache.org/jira/browse/HDDS-2283 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-2283.00.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Container Creation on datanodes take around 300ms due to rocksdb creation. > Rocksdb creation is taking a considerable time and this needs to be optimized. > Creating a rocksdb per disk should be enough and each container can be table > inside the rocksdb. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.
[ https://issues.apache.org/jira/browse/HDDS-2283?focusedWorklogId=331035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331035 ] ASF GitHub Bot logged work on HDDS-2283: Author: ASF GitHub Bot Created on: 20/Oct/19 06:42 Start Date: 20/Oct/19 06:42 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #41: HDDS-2283. Container Creation on datanodes take around 300ms due to rocksdb creation. URL: https://github.com/apache/hadoop-ozone/pull/41 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331035) Time Spent: 20m (was: 10m) > Container creation on datanodes take time because of Rocksdb option creation. > - > > Key: HDDS-2283 > URL: https://issues.apache.org/jira/browse/HDDS-2283 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-2283.00.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Container Creation on datanodes take around 300ms due to rocksdb creation. > Rocksdb creation is taking a considerable time and this needs to be optimized. > Creating a rocksdb per disk should be enough and each container can be table > inside the rocksdb. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2286) Add a log info in ozone client and scm to print the exclusion list during allocate block
[ https://issues.apache.org/jira/browse/HDDS-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh resolved HDDS-2286. - Resolution: Fixed Thanks for the contribution [~swagle] and [~adoroszlai] for the review. I have committed this. > Add a log info in ozone client and scm to print the exclusion list during > allocate block > > > Key: HDDS-2286 > URL: https://issues.apache.org/jira/browse/HDDS-2286 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2286) Add a log info in ozone client and scm to print the exclusion list during allocate block
[ https://issues.apache.org/jira/browse/HDDS-2286?focusedWorklogId=331032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331032 ] ASF GitHub Bot logged work on HDDS-2286: Author: ASF GitHub Bot Created on: 20/Oct/19 06:26 Start Date: 20/Oct/19 06:26 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #46: HDDS-2286. Add a log info in ozone client and scm to print the exclus… URL: https://github.com/apache/hadoop-ozone/pull/46 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 331032) Time Spent: 20m (was: 10m) > Add a log info in ozone client and scm to print the exclusion list during > allocate block > > > Key: HDDS-2286 > URL: https://issues.apache.org/jira/browse/HDDS-2286 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org