[jira] [Updated] (HDDS-4400) Make raft log directory deletion configurable during pipeline remove
[ https://issues.apache.org/jira/browse/HDDS-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-4400: -- Description: The idea here is to add a config to make raft log directory removal configurable during pipeline remove. (was: The idea here is to add a hidden config to make raft log directory removal configurable during pipeline remove.) > Make raft log directory deletion configurable during pipeline remove > > > Key: HDDS-4400 > URL: https://issues.apache.org/jira/browse/HDDS-4400 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 1.1.0 > > > The idea here is to add a config to make raft log directory removal > configurable during pipeline remove. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4400) Make raft log directory deletion configurable during pipeline remove
Shashikant Banerjee created HDDS-4400: - Summary: Make raft log directory deletion configurable during pipeline remove Key: HDDS-4400 URL: https://issues.apache.org/jira/browse/HDDS-4400 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 1.1.0 The idea here is to add a hidden config to make raft log directory removal configurable during pipeline remove. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4399) Safe mode rule for piplelines should only consider open pipelines
Shashikant Banerjee created HDDS-4399: - Summary: Safe mode rule for piplelines should only consider open pipelines Key: HDDS-4399 URL: https://issues.apache.org/jira/browse/HDDS-4399 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 1.1.0 Currently, for safe mode we consider all pipelines existing in DB for safe mode exit criteria. It ma happen that, SCM has the pipelines craeted , but none of the participants datanodes ever created these datanodes. In such cases, SCM fails to come out of safemode as these pipelines are never reported back to SCM. The idea here is to consider pipelines which are marked open during SCM startup. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3700) Number of open containers per pipeline should be tuned as per the number of disks on datanode
[ https://issues.apache.org/jira/browse/HDDS-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3700. --- Fix Version/s: 1.1.0 Resolution: Fixed > Number of open containers per pipeline should be tuned as per the number of > disks on datanode > - > > Key: HDDS-3700 > URL: https://issues.apache.org/jira/browse/HDDS-3700 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 1.1.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: Performance > Fix For: 1.1.0 > > Attachments: Load Distribution Across disks in Ozone.pdf, Screenshot > 2020-06-02 at 12.44.14 PM.png > > > Currently, "ozone.scm.pipeline.owner.container.count" is configured by > default to 3. The default should ideally be a function of the no of disks on > a datanode. A static value may lead to uneven utilisation during active IO . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4388) Make writeStateMachineTimeout retry count proportional to node failure timeout
[ https://issues.apache.org/jira/browse/HDDS-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-4388: -- Description: Currently, in ratis "writeStateMachinecall" gets retried indefinitely in event of a timeout. In case, where disks are slow/overloaded or number of chunk writer threads are not available for a period of 10s, writeStateMachine call times out in 10s. In cases like these, the same write chunk keeps on getting retried causing the same chunk of data to be overwritten. The idea here is to abort the request once the node failure timeout reaches. (was: Currently, in ratis "writeStateMachinecall" gets retried indefinitely in event of a timeout. In case, where disks are slow/overloaded or number of chunk writer threads are not available for a period of 10s, writeStateMachine call times out in 10s. In cases like these, the same write chunk keeps on getting retried causing the same chink of data to be overwritten. The idea here is to abort the request once the node failure timeout reaches.) > Make writeStateMachineTimeout retry count proportional to node failure timeout > -- > > Key: HDDS-4388 > URL: https://issues.apache.org/jira/browse/HDDS-4388 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 1.1.0 > > > Currently, in ratis "writeStateMachinecall" gets retried indefinitely in > event of a timeout. In case, where disks are slow/overloaded or number of > chunk writer threads are not available for a period of 10s, writeStateMachine > call times out in 10s. In cases like these, the same write chunk keeps on > getting retried causing the same chunk of data to be overwritten. The idea > here is to abort the request once the node failure timeout reaches. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4388) Make writeStateMachineTimeout retry count proportional to node failure timeout
Shashikant Banerjee created HDDS-4388: - Summary: Make writeStateMachineTimeout retry count proportional to node failure timeout Key: HDDS-4388 URL: https://issues.apache.org/jira/browse/HDDS-4388 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 1.1.0 Currently, in ratis "writeStateMachinecall" gets retried indefinitely in event of a timeout. In case, where disks are slow/overloaded or number of chunk writer threads are not available for a period of 10s, writeStateMachine call times out in 10s. In cases like these, the same write chunk keeps on getting retried causing the same chink of data to be overwritten. The idea here is to abort the request once the node failure timeout reaches. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3103) Have multi-raft pipeline calculator to recommend best pipeline number per datanode
[ https://issues.apache.org/jira/browse/HDDS-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213047#comment-17213047 ] Shashikant Banerjee commented on HDDS-3103: --- This should have been addresed with HDDS-3700. > Have multi-raft pipeline calculator to recommend best pipeline number per > datanode > -- > > Key: HDDS-3103 > URL: https://issues.apache.org/jira/browse/HDDS-3103 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: SCM >Affects Versions: 0.5.0 >Reporter: Li Cheng >Priority: Critical > > PipelinePlacementPolicy should have a calculator method to recommend better > number for pipeline number per node. The number used to come from > ozone.datanode.pipeline.limit in config. SCM should be able to consider how > many ratis dir and the ratis retry timeout to recommend the best pipeline > number for every node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4335) No user access checks in Ozone FS
Shashikant Banerjee created HDDS-4335: - Summary: No user access checks in Ozone FS Key: HDDS-4335 URL: https://issues.apache.org/jira/browse/HDDS-4335 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Shashikant Banerjee Currently, a dir/file created with hdfs user cab be deleted by any user. {code:java} [sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ sudo -u hdfs ozone fs -mkdir o3fs://bucket1.vol1.ozone1/data/sandbox/poc/teragen [sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ sudo -u hdfs ozone fs -ls o3fs://bucket1.vol1.ozone1/data/sandbox/poc/teragen [sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ sudo -u hdfs ozone fs -ls o3fs://bucket1.vol1.ozone1/data/sandbox/poc/ Found 1 items drwxrwxrwx - hdfs hdfs 0 2020-10-12 02:51 o3fs://bucket1.vol1.ozone1/data/sandbox/poc/teragen [sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ [sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ [sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ [sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ ozone fs -rm -r o3fs://bucket1.vol1.ozone1/data/sandbox/poc/ 20/10/12 02:52:16 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 20/10/12 02:52:16 INFO ozone.BasicOzoneFileSystem: Move to trash is disabled for o3fs, deleting instead: o3fs://bucket1.vol1.ozone1/data/sandbox/poc. Files or directories will NOT be retained in trash. Ignore the following TrashPolicyDefault message, if any. 20/10/12 02:52:16 INFO fs.TrashPolicyDefault: Moved: 'o3fs://bucket1.vol1.ozone1/data/sandbox/poc' to trash at: /.Trash/sbanerjee/Current/data/sandbox/poc1602496336480 [sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ sudo -u hdfs ozone fs -ls o3fs://bucket1.vol1.ozone1/data/sandbox/poc/ ls: `o3fs://bucket1.vol1.ozone1/data/sandbox/poc/': No such file or directory {code} Whereas, the same seuquence fails with permission denied error in HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-4318) Disable single node pipeline creation by default in Ozone
[ https://issues.apache.org/jira/browse/HDDS-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-4318: - Assignee: Aryan Gupta > Disable single node pipeline creation by default in Ozone > - > > Key: HDDS-4318 > URL: https://issues.apache.org/jira/browse/HDDS-4318 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Shashikant Banerjee >Assignee: Aryan Gupta >Priority: Major > > Currently, single node pipeline creation is ON by default in ozone, though > its not used by default in Ozone write path. It would be good to disable this > by turning off the config "ozone.scm.pipeline.creation.auto.factor.one" by > default. It may lead to some unit test failures and for those tests , this > config needs to b explicitly set to true. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4318) Disable single node pipeline creation by default in Ozone
Shashikant Banerjee created HDDS-4318: - Summary: Disable single node pipeline creation by default in Ozone Key: HDDS-4318 URL: https://issues.apache.org/jira/browse/HDDS-4318 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 1.1.0 Reporter: Shashikant Banerjee Currently, single node pipeline creation is ON by default in ozone, though its not used by default in Ozone write path. It would be good to disable this by turning off the config "ozone.scm.pipeline.creation.auto.factor.one" by default. It may lead to some unit test failures and for those tests , this config needs to b explicitly set to true. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3811) Add tests to verify all the disks of a datanode are utilized for write
[ https://issues.apache.org/jira/browse/HDDS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3811: - Assignee: (was: Shashikant Banerjee) > Add tests to verify all the disks of a datanode are utilized for write > --- > > Key: HDDS-3811 > URL: https://issues.apache.org/jira/browse/HDDS-3811 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Priority: Major > Fix For: 1.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3811) Add tests to verify all the disks of a datanode are utilized for write
[ https://issues.apache.org/jira/browse/HDDS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3811. --- Fix Version/s: 1.1.0 Resolution: Fixed > Add tests to verify all the disks of a datanode are utilized for write > --- > > Key: HDDS-3811 > URL: https://issues.apache.org/jira/browse/HDDS-3811 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 1.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4072) Pipeline creation on a datanodes should account for raft log disks reported
[ https://issues.apache.org/jira/browse/HDDS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-4072. --- Fix Version/s: 1.1.0 Resolution: Fixed > Pipeline creation on a datanodes should account for raft log disks reported > --- > > Key: HDDS-4072 > URL: https://issues.apache.org/jira/browse/HDDS-4072 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 1.1.0 > > > Currently, how many pipelines a datanode will be a part is controlled by a > config ozone.datanode.pipeline.limit. Now, with the no of raft log disks > reported by datanode to SCM, we can potentially set the limit on pipeline > capacity based on raft log disks reported instead . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4298) Use an interface in Ozone client instead of XceiverClientManager
[ https://issues.apache.org/jira/browse/HDDS-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-4298. --- Fix Version/s: 1.1.0 Resolution: Fixed > Use an interface in Ozone client instead of XceiverClientManager > > > Key: HDDS-4298 > URL: https://issues.apache.org/jira/browse/HDDS-4298 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > Fix For: 1.1.0 > > > XceiverClientManager is used everywhere in the ozone client (Key/Block > Input/OutputStream) to get a client when required. > To make it easier to create genesis/real unit tests, it would be better to > use a generic interface instead of XceiverClientManager which can make it > easy to replace the manager with a mock implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3297) TestOzoneClientKeyGenerator is flaky
[ https://issues.apache.org/jira/browse/HDDS-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3297. --- Fix Version/s: 1.1.0 Assignee: Aryan Gupta Resolution: Fixed > TestOzoneClientKeyGenerator is flaky > > > Key: HDDS-3297 > URL: https://issues.apache.org/jira/browse/HDDS-3297 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Reporter: Marton Elek >Assignee: Aryan Gupta >Priority: Critical > Labels: TriagePending, flaky-test, ozone-flaky-test, > pull-request-available > Fix For: 1.1.0 > > Attachments: > org.apache.hadoop.ozone.freon.TestOzoneClientKeyGenerator-output.txt > > > Sometimes it's hanging and stopped after a timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4218) Remove test TestRatisManager
[ https://issues.apache.org/jira/browse/HDDS-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-4218. --- Fix Version/s: 1.1.0 Resolution: Fixed > Remove test TestRatisManager > > > Key: HDDS-4218 > URL: https://issues.apache.org/jira/browse/HDDS-4218 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > Fix For: 1.1.0 > > > Delete this test as RatisManager is no longer present and this test has been > disabled for a long time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4217) Remove test TestOzoneContainerRatis
[ https://issues.apache.org/jira/browse/HDDS-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-4217. --- Fix Version/s: 1.1.0 Resolution: Fixed > Remove test TestOzoneContainerRatis > --- > > Key: HDDS-4217 > URL: https://issues.apache.org/jira/browse/HDDS-4217 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > Fix For: 1.1.0 > > > Delete TestOzoneContainerRatis as it has been disabled for a long time and is > no longer relevant. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3151) Intermittent timeout in TestCloseContainerHandlingByClient#testMultiBlockWrites3#testDiscardPreallocatedBlocks
[ https://issues.apache.org/jira/browse/HDDS-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3151. --- Fix Version/s: 1.1.0 Resolution: Fixed > Intermittent timeout in > TestCloseContainerHandlingByClient#testMultiBlockWrites3#testDiscardPreallocatedBlocks > -- > > Key: HDDS-3151 > URL: https://issues.apache.org/jira/browse/HDDS-3151 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Reporter: Attila Doroszlai >Assignee: Aryan Gupta >Priority: Major > Labels: pull-request-available > Fix For: 1.1.0 > > Attachments: > org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient-output.txt, > org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.txt > > > {code:title=https://github.com/apache/hadoop-ozone/runs/495906854} > Tests run: 8, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 210.963 s <<< > FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient > testMultiBlockWrites3(org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient) > Time elapsed: 108.777 s <<< ERROR! > java.util.concurrent.TimeoutException: > ... > at > org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:251) > at > org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:151) > at > org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.waitForContainerClose(TestCloseContainerHandlingByClient.java:342) > at > org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.testMultiBlockWrites3(TestCloseContainerHandlingByClient.java:310) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3762) Intermittent failure in TestDeleteWithSlowFollower
[ https://issues.apache.org/jira/browse/HDDS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3762. --- Fix Version/s: 1.1.0 Resolution: Fixed > Intermittent failure in TestDeleteWithSlowFollower > -- > > Key: HDDS-3762 > URL: https://issues.apache.org/jira/browse/HDDS-3762 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 1.0.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 1.1.0 > > > TestDeleteWithSlowFollower failed soon after it was re-enabled in HDDS-3330. > {code:title=https://github.com/apache/hadoop-ozone/runs/753363338} > [INFO] Running org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower > [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 28.647 s <<< FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower > [ERROR] > testDeleteKeyWithSlowFollower(org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower) > Time elapsed: 0.163 s <<< FAILURE! > java.lang.AssertionError > ... > at org.junit.Assert.assertNotNull(Assert.java:631) > at > org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.testDeleteKeyWithSlowFollower(TestDeleteWithSlowFollower.java:225) > {code} > CC [~shashikant] [~elek] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4077) Incomplete OzoneFileSystem statistics
[ https://issues.apache.org/jira/browse/HDDS-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-4077: -- Fix Version/s: 1.1 Resolution: Fixed Status: Resolved (was: Patch Available) > Incomplete OzoneFileSystem statistics > - > > Key: HDDS-4077 > URL: https://issues.apache.org/jira/browse/HDDS-4077 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Fix For: 1.1 > > > OzoneFileSystem does not record some of the operations that are defined in > [Statistic|https://github.com/apache/hadoop-ozone/blob/d7ea4966656cfdb0b53a368eac52d71adb717104/hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/Statistic.java#L44-L75]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4149) Implement OzoneFileStatus#toString
[ https://issues.apache.org/jira/browse/HDDS-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-4149: -- Fix Version/s: 0.6.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Implement OzoneFileStatus#toString > -- > > Key: HDDS-4149 > URL: https://issues.apache.org/jira/browse/HDDS-4149 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Filesystem >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Fix For: 0.6.0 > > > {{OzoneFileStatus}} should implement {{toString}} for debug purposes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4048) Show more information while SCM version info mismatch
[ https://issues.apache.org/jira/browse/HDDS-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-4048. --- Fix Version/s: 0.7.0 Resolution: Fixed > Show more information while SCM version info mismatch > - > > Key: HDDS-4048 > URL: https://issues.apache.org/jira/browse/HDDS-4048 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Affects Versions: 0.6.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Labels: pull-request-available > Fix For: 0.7.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4078) Use HDDS InterfaceAudience/Stability annotations
[ https://issues.apache.org/jira/browse/HDDS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-4078: -- Fix Version/s: 0.7.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Use HDDS InterfaceAudience/Stability annotations > > > Key: HDDS-4078 > URL: https://issues.apache.org/jira/browse/HDDS-4078 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 0.7.0 > > > HDDS-3028 added Ozone-private versions of {{InterfaceAudience}} and > {{InterfaceStability}} annotations. Some recent changes re-introduced usage > of their Hadoop Common versions. > {code} > hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/CleanupTableInfo.java > 19:import org.apache.hadoop.classification.InterfaceStability; > hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneClientAdapterImpl.java > 28:import org.apache.hadoop.classification.InterfaceAudience; > hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicRootedOzoneFileSystem.java > 21:import org.apache.hadoop.classification.InterfaceAudience; > 22:import org.apache.hadoop.classification.InterfaceStability; > hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicRootedOzoneClientAdapterImpl.java > 33:import org.apache.hadoop.classification.InterfaceAudience; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4034) Add Unit Test for HadoopNestedDirGenerator
[ https://issues.apache.org/jira/browse/HDDS-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-4034. --- Fix Version/s: 0.7.0 Resolution: Fixed > Add Unit Test for HadoopNestedDirGenerator > -- > > Key: HDDS-4034 > URL: https://issues.apache.org/jira/browse/HDDS-4034 > Project: Hadoop Distributed Data Store > Issue Type: Test >Reporter: Aryan Gupta >Assignee: Aryan Gupta >Priority: Major > Labels: https://github.com/apache/hadoop-ozone/pull/1266, > pull-request-available > Fix For: 0.7.0 > > > Unit test - It checks the span and depth of nested directories created by the > HadoopNestedDirGenerator Tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4072) Pipeline creation on a datanodes should account for raft log disks reported
Shashikant Banerjee created HDDS-4072: - Summary: Pipeline creation on a datanodes should account for raft log disks reported Key: HDDS-4072 URL: https://issues.apache.org/jira/browse/HDDS-4072 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: Ozone Datanode, SCM Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Currently, how many pipelines a datanode will be a part is controlled by a config ozone.datanode.pipeline.limit. Now, with the no of raft log disks reported by datanode to SCM, we can potentially set the limit on pipeline capacity based on raft log disks reported instead . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3810) Add the logic to distribute open containers among the pipelines of a datanode
[ https://issues.apache.org/jira/browse/HDDS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-3810: -- Summary: Add the logic to distribute open containers among the pipelines of a datanode (was: Add the logic to distribute open containers among the piplelines of a datanode) > Add the logic to distribute open containers among the pipelines of a datanode > - > > Key: HDDS-3810 > URL: https://issues.apache.org/jira/browse/HDDS-3810 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > > A datanode can participate in multiple pipelines based on no of raft log > disks as well the disk type. SCM should make the distribution of open > containers among these set of pipelines evenly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3963) Remove XceiverServerSpi interface
Shashikant Banerjee created HDDS-3963: - Summary: Remove XceiverServerSpi interface Key: HDDS-3963 URL: https://issues.apache.org/jira/browse/HDDS-3963 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Shashikant Banerjee Fix For: 0.7.0 As per [~elek] here [https://github.com/apache/hadoop-ozone/pull/1107#discussion_r447545366:] {code:java} It seems to be a good time the remove XceiverServerSpi interface. Originally we had two separated implementation to connect to the datanode. Today we have only one. One interface is used between the client and the datanode, and the other one between datanode and ratis (datanode). As this example shows, the two interface shouldn't be the same. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3861) Fix handlePipelineFailure throw exception if role is follower
[ https://issues.apache.org/jira/browse/HDDS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3861. --- Fix Version/s: 0.6.0 Resolution: Fixed > Fix handlePipelineFailure throw exception if role is follower > - > > Key: HDDS-3861 > URL: https://issues.apache.org/jira/browse/HDDS-3861 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Attachments: screenshot-1.png > > > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3808) Ensure volume info on a datanode is propagate to SCM
[ https://issues.apache.org/jira/browse/HDDS-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3808. --- Fix Version/s: 0.6.0 Resolution: Implemented > Ensure volume info on a datanode is propagate to SCM > > > Key: HDDS-3808 > URL: https://issues.apache.org/jira/browse/HDDS-3808 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.6.0 > > > The aim here is to verify if volume level info of datanode is propaged to > datanode and if not, add the support here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3864) Add a tool to dispaly containers on a datanode
[ https://issues.apache.org/jira/browse/HDDS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3864: - Assignee: Sadanand Shenoy (was: Shashikant Banerjee) > Add a tool to dispaly containers on a datanode > -- > > Key: HDDS-3864 > URL: https://issues.apache.org/jira/browse/HDDS-3864 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Sadanand Shenoy >Priority: Major > > The idea here is to add a utility to dump containers displaying containerIDs > and BCSID on a datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3864) Add a tool to dispaly containers on a datanode
[ https://issues.apache.org/jira/browse/HDDS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-3864: -- Description: The idea here is to add a utility to dump containers displaying containerIDs and BCSID on a datanode. (was: The idea here is to add a utility to dump containers displying containerIDs and BCSID on a datanode. ) > Add a tool to dispaly containers on a datanode > -- > > Key: HDDS-3864 > URL: https://issues.apache.org/jira/browse/HDDS-3864 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > > The idea here is to add a utility to dump containers displaying containerIDs > and BCSID on a datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3864) Add a tool to dispaly containers on a datanode
Shashikant Banerjee created HDDS-3864: - Summary: Add a tool to dispaly containers on a datanode Key: HDDS-3864 URL: https://issues.apache.org/jira/browse/HDDS-3864 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee The idea here is to add a utility to dump containers displying containerIDs and BCSID on a datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3263) Fix TestCloseContainerByPipeline.java
[ https://issues.apache.org/jira/browse/HDDS-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3263: - Assignee: Shashikant Banerjee > Fix TestCloseContainerByPipeline.java > - > > Key: HDDS-3263 > URL: https://issues.apache.org/jira/browse/HDDS-3263 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3422) Enable TestCloseContainerHandlingByClient test cases
[ https://issues.apache.org/jira/browse/HDDS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3422: - Assignee: Shashikant Banerjee (was: Prashant Pogde) > Enable TestCloseContainerHandlingByClient test cases > > > Key: HDDS-3422 > URL: https://issues.apache.org/jira/browse/HDDS-3422 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Nanda kumar >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.6.0 > > > Fix and enable TestCloseContainerHandlingByClient test cases -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3424) Enable TestContainerStateMachineFailures test cases
[ https://issues.apache.org/jira/browse/HDDS-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3424: - Assignee: Shashikant Banerjee (was: Prashant Pogde) > Enable TestContainerStateMachineFailures test cases > --- > > Key: HDDS-3424 > URL: https://issues.apache.org/jira/browse/HDDS-3424 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Nanda kumar >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.6.0 > > > Fix and enable TestContainerStateMachineFailures test cases -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3853) Container marked as missing on datanode while container directory do exist
[ https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3853: - Assignee: Shashikant Banerjee > Container marked as missing on datanode while container directory do exist > -- > > Key: HDDS-3853 > URL: https://issues.apache.org/jira/browse/HDDS-3853 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Shashikant Banerjee >Priority: Major > > INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: > PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , > Message: ContainerID 1744 has been lost and and cannot be recreated on this > DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred. > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 1744 has been lost and and cannot be recreated on this DataNode > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > ERROR > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine: > gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex > 40079 msg : ContainerID 1744 has been lost and and cannot be recreated on > this DataNode Container Result: CONTAINER_MISSING > ERROR > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis: > pipeline Action CLOSE on pipeline > PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction > failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER > .Triggering pipeline close action > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3853) Container marked as missing on datanode while container directory do exist
[ https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3853: - Assignee: (was: Shashikant Banerjee) > Container marked as missing on datanode while container directory do exist > -- > > Key: HDDS-3853 > URL: https://issues.apache.org/jira/browse/HDDS-3853 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Priority: Major > > INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: > PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , > Message: ContainerID 1744 has been lost and and cannot be recreated on this > DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred. > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 1744 has been lost and and cannot be recreated on this DataNode > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > ERROR > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine: > gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex > 40079 msg : ContainerID 1744 has been lost and and cannot be recreated on > this DataNode Container Result: CONTAINER_MISSING > ERROR > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis: > pipeline Action CLOSE on pipeline > PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction > failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER > .Triggering pipeline close action > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3853) Container marked as missing on datanode while container directory do exist
[ https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3853: - Assignee: Shashikant Banerjee > Container marked as missing on datanode while container directory do exist > -- > > Key: HDDS-3853 > URL: https://issues.apache.org/jira/browse/HDDS-3853 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Shashikant Banerjee >Priority: Major > > INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: > PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , > Message: ContainerID 1744 has been lost and and cannot be recreated on this > DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred. > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 1744 has been lost and and cannot be recreated on this DataNode > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > ERROR > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine: > gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex > 40079 msg : ContainerID 1744 has been lost and and cannot be recreated on > this DataNode Container Result: CONTAINER_MISSING > ERROR > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis: > pipeline Action CLOSE on pipeline > PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction > failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER > .Triggering pipeline close action > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3430) Enable TestWatchForCommit test cases
[ https://issues.apache.org/jira/browse/HDDS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3430: - Assignee: Shashikant Banerjee > Enable TestWatchForCommit test cases > > > Key: HDDS-3430 > URL: https://issues.apache.org/jira/browse/HDDS-3430 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Nanda kumar >Assignee: Shashikant Banerjee >Priority: Major > > Fix and enable TestWatchForCommit test cases -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3424) Enable TestContainerStateMachineFailures test cases
[ https://issues.apache.org/jira/browse/HDDS-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3424. --- Fix Version/s: 0.6.0 Resolution: Duplicate > Enable TestContainerStateMachineFailures test cases > --- > > Key: HDDS-3424 > URL: https://issues.apache.org/jira/browse/HDDS-3424 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Nanda kumar >Assignee: Prashant Pogde >Priority: Major > Fix For: 0.6.0 > > > Fix and enable TestContainerStateMachineFailures test cases -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3422) Enable TestCloseContainerHandlingByClient test cases
[ https://issues.apache.org/jira/browse/HDDS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3422. --- Fix Version/s: 0.6.0 Resolution: Fixed Tests have been enabled with https://issues.apache.org/jira/browse/HDDS-2936. > Enable TestCloseContainerHandlingByClient test cases > > > Key: HDDS-3422 > URL: https://issues.apache.org/jira/browse/HDDS-3422 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: test >Affects Versions: 0.5.0 >Reporter: Nanda kumar >Assignee: Prashant Pogde >Priority: Major > Fix For: 0.6.0 > > > Fix and enable TestCloseContainerHandlingByClient test cases -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3802) Incorrect data returned by reading a FILE_PER_CHUNK block
[ https://issues.apache.org/jira/browse/HDDS-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3802. --- Fix Version/s: 0.6.0 Resolution: Fixed > Incorrect data returned by reading a FILE_PER_CHUNK block > - > > Key: HDDS-3802 > URL: https://issues.apache.org/jira/browse/HDDS-3802 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Critical > Labels: pull-request-available > Fix For: 0.6.0 > > > A summary of s3 big file download result with Apri 22th master branch code, > 1. download with aws s3 sdk, md5 sum is different > 2. download with "ozone fs -get o3fs://", md5 sum is different > 3. download with "ozone sh key get", md5 sum is the same as the local file > So it seems the issue is from the read part. And the md5sum result of step > 1. and step 2. are also different from each other. (edited) > The difference behaviors are caused by different read buffer size of > different interface. If the read buffer size equals to chunk size, then fine. > If the read buffer size is smaller than chunk size, then content returned is > incorrent, because datanode side read ignore the offset in request, use 0 as > offset to read the data. > FilePerChunkStrategy#readChunk > {code:java} > // use offset only if file written by old datanode > long offset; > if (file.exists() && file.length() == info.getOffset() + len) { > offset = info.getOffset(); > } else { > offset = 0; ---> this line causes the trouble. > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3262) Fix TestOzoneRpcClientWithRatis.java
[ https://issues.apache.org/jira/browse/HDDS-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3262. --- Fix Version/s: 0.6.0 Resolution: Fixed > Fix TestOzoneRpcClientWithRatis.java > > > Key: HDDS-3262 > URL: https://issues.apache.org/jira/browse/HDDS-3262 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3811) Add tests to verify all the disks of a datanode are utilized for write
[ https://issues.apache.org/jira/browse/HDDS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3811: - Assignee: Shashikant Banerjee > Add tests to verify all the disks of a datanode are utilized for write > --- > > Key: HDDS-3811 > URL: https://issues.apache.org/jira/browse/HDDS-3811 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3808) Ensure volume info on a datanode is propagate to SCM
[ https://issues.apache.org/jira/browse/HDDS-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3808: - Assignee: Shashikant Banerjee > Ensure volume info on a datanode is propagate to SCM > > > Key: HDDS-3808 > URL: https://issues.apache.org/jira/browse/HDDS-3808 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.6.0 > > > The aim here is to verify if volume level info of datanode is propaged to > datanode and if not, add the support here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3809) Make number of open containers on a datanode a function of no of volumes reported by it
[ https://issues.apache.org/jira/browse/HDDS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-3809: -- Fix Version/s: 0.6.0 > Make number of open containers on a datanode a function of no of volumes > reported by it > --- > > Key: HDDS-3809 > URL: https://issues.apache.org/jira/browse/HDDS-3809 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.6.0 > > > The no of open containers on a datanode is to be driven by factor of no of > data disks available multiplied by no of open containers per disk. The aim > here is to add the logic here and verify it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3810) Add the logic to distribute open containers among the piplelines of a datanode
[ https://issues.apache.org/jira/browse/HDDS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3810: - Assignee: Shashikant Banerjee > Add the logic to distribute open containers among the piplelines of a datanode > -- > > Key: HDDS-3810 > URL: https://issues.apache.org/jira/browse/HDDS-3810 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > > A datanode can participate in multiple pipelines based on no of raft log > disks as well the disk type. SCM should make the distribution of open > containers among these set of pipelines evenly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3807) Propagate raft log disks info to SCM from datanode
[ https://issues.apache.org/jira/browse/HDDS-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3807: - Assignee: Shashikant Banerjee > Propagate raft log disks info to SCM from datanode > -- > > Key: HDDS-3807 > URL: https://issues.apache.org/jira/browse/HDDS-3807 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > > No of pipelines to be created on a datanode is to be driven by the no of raft > log disks configured on datanode. The Jira here is to add support for > propagation of raft log volume info to SCM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3809) Make number of open containers on a datanode a function of no of volumes reported by it
[ https://issues.apache.org/jira/browse/HDDS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3809: - Assignee: Shashikant Banerjee > Make number of open containers on a datanode a function of no of volumes > reported by it > --- > > Key: HDDS-3809 > URL: https://issues.apache.org/jira/browse/HDDS-3809 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > > The no of open containers on a datanode is to be driven by factor of no of > data disks available multiplied by no of open containers per disk. The aim > here is to add the logic here and verify it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3136) retry timeout is large while writing key
[ https://issues.apache.org/jira/browse/HDDS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3136. --- Fix Version/s: 0.6.0 Resolution: Fixed > retry timeout is large while writing key > > > Key: HDDS-3136 > URL: https://issues.apache.org/jira/browse/HDDS-3136 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Reporter: Nilotpal Nandi >Priority: Major > Labels: TriagePending, fault_injection > Fix For: 0.6.0 > > > Steps : > # Mounted noise injection FUSE on all datanodes. > # Injected WRITE delay of 5 seconds on one of the datanodes from each open > pipeline > # Write a key of 180 MB > Write operation took more than 10 minutes to complete -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3350) Ozone Retry Policy Improvements
[ https://issues.apache.org/jira/browse/HDDS-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3350. --- Fix Version/s: 0.6.0 Resolution: Fixed > Ozone Retry Policy Improvements > --- > > Key: HDDS-3350 > URL: https://issues.apache.org/jira/browse/HDDS-3350 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Blocker > Labels: Triaged, pull-request-available > Fix For: 0.6.0 > > Attachments: Retry Behaviour in Ozone Client.pdf, Retry Behaviour in > Ozone Client_Updated.pdf, Retry Behaviour in Ozone Client_Updated_2.pdf, > Retry Policy Results - Teragen 100GB.pdf > > > Currently any ozone client request can spend a huge amount of time in retries > and ozone client can retry its requests very aggressively. The waiting time > can thus be very high before a client request fails. Further aggressive > retries by ratis client used by ozone can bog down a ratis pipeline leader. > The Jira aims to make changes to the current retry behavior in Ozone client. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3810) Add the logic to distribute open containers among the piplelines of a datanode
[ https://issues.apache.org/jira/browse/HDDS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-3810: -- Summary: Add the logic to distribute open containers among the piplelines of a datanode (was: Add the logic to distribute open containers among the pipleines of a datanode) > Add the logic to distribute open containers among the piplelines of a datanode > -- > > Key: HDDS-3810 > URL: https://issues.apache.org/jira/browse/HDDS-3810 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Priority: Major > > A datanode can participate in multiple pipelines based on no of raft log > disks as well the disk type. SCM should make the distribution of open > containers among these set of pipelines evenly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3811) Add tests to verify all the disks of a datanode are utilized for write
Shashikant Banerjee created HDDS-3811: - Summary: Add tests to verify all the disks of a datanode are utilized for write Key: HDDS-3811 URL: https://issues.apache.org/jira/browse/HDDS-3811 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Shashikant Banerjee -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3810) Add the logic to distribute open containers among the pipleines of a datanode
Shashikant Banerjee created HDDS-3810: - Summary: Add the logic to distribute open containers among the pipleines of a datanode Key: HDDS-3810 URL: https://issues.apache.org/jira/browse/HDDS-3810 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Shashikant Banerjee A datanode can participate in multiple pipelines based on no of raft log disks as well the disk type. SCM should make the distribution of open containers among these set of pipelines evenly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3809) Make number of open containers on a datanode a function of no of volumes reported by it
Shashikant Banerjee created HDDS-3809: - Summary: Make number of open containers on a datanode a function of no of volumes reported by it Key: HDDS-3809 URL: https://issues.apache.org/jira/browse/HDDS-3809 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Shashikant Banerjee The no of open containers on a datanode is to be driven by factor of no of data disks available multiplied by no of open containers per disk. The aim here is to add the logic here and verify it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3808) Ensure volume info on a datanode is propagate to SCM
Shashikant Banerjee created HDDS-3808: - Summary: Ensure volume info on a datanode is propagate to SCM Key: HDDS-3808 URL: https://issues.apache.org/jira/browse/HDDS-3808 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Shashikant Banerjee Fix For: 0.6.0 The aim here is to verify if volume level info of datanode is propaged to datanode and if not, add the support here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3807) Propagate raft log diks info to SCM from datanode
Shashikant Banerjee created HDDS-3807: - Summary: Propagate raft log diks info to SCM from datanode Key: HDDS-3807 URL: https://issues.apache.org/jira/browse/HDDS-3807 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: Ozone Datanode, SCM Reporter: Shashikant Banerjee No of pipelines to be created on a datnode is to be driven by the no of raft log disks configured on datanode. The Jira here is to add support for propagation of raft log volume info to SCM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3807) Propagate raft log disks info to SCM from datanode
[ https://issues.apache.org/jira/browse/HDDS-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-3807: -- Description: No of pipelines to be created on a datanode is to be driven by the no of raft log disks configured on datanode. The Jira here is to add support for propagation of raft log volume info to SCM. (was: No of pipelines to be created on a datnode is to be driven by the no of raft log disks configured on datanode. The Jira here is to add support for propagation of raft log volume info to SCM.) Summary: Propagate raft log disks info to SCM from datanode (was: Propagate raft log diks info to SCM from datanode) > Propagate raft log disks info to SCM from datanode > -- > > Key: HDDS-3807 > URL: https://issues.apache.org/jira/browse/HDDS-3807 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Shashikant Banerjee >Priority: Major > > No of pipelines to be created on a datanode is to be driven by the no of raft > log disks configured on datanode. The Jira here is to add support for > propagation of raft log volume info to SCM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3700) Number of open containers per pipeline should be tuned as per the number of disks on datanode
[ https://issues.apache.org/jira/browse/HDDS-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-3700: -- Attachment: Load Distribution Across disks in Ozone.pdf > Number of open containers per pipeline should be tuned as per the number of > disks on datanode > - > > Key: HDDS-3700 > URL: https://issues.apache.org/jira/browse/HDDS-3700 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.7.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: Performance > Attachments: Load Distribution Across disks in Ozone.pdf, Screenshot > 2020-06-02 at 12.44.14 PM.png > > > Currently, "ozone.scm.pipeline.owner.container.count" is configured by > default to 3. The default should ideally be a function of the no of disks on > a datanode. A static value may lead to uneven utilisation during active IO . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3700) Number of open containers per pipeline should be tuned as per the number of disks on datanode
[ https://issues.apache.org/jira/browse/HDDS-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3700: - Assignee: Shashikant Banerjee > Number of open containers per pipeline should be tuned as per the number of > disks on datanode > - > > Key: HDDS-3700 > URL: https://issues.apache.org/jira/browse/HDDS-3700 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.7.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: Performance > Attachments: Screenshot 2020-06-02 at 12.44.14 PM.png > > > Currently, "ozone.scm.pipeline.owner.container.count" is configured by > default to 3. The default should ideally be a function of the no of disks on > a datanode. A static value may lead to uneven utilisation during active IO . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3789) Fix TestOzoneRpcClientAbstract#testDeletedKeyForGDPR
[ https://issues.apache.org/jira/browse/HDDS-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-3789: -- Parent: HDDS-2964 Issue Type: Sub-task (was: Bug) > Fix TestOzoneRpcClientAbstract#testDeletedKeyForGDPR > > > Key: HDDS-3789 > URL: https://issues.apache.org/jira/browse/HDDS-3789 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Priority: Major > > {code:java} > [ERROR] Tests run: 67, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: > 36.615 s <<< FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis > 3053[ERROR] > testDeletedKeyForGDPR(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis) > Time elapsed: 0.165 s <<< ERROR! > 3054java.lang.NullPointerException > 3055 at > org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientAbstract.testDeletedKeyForGDPR(TestOzoneRpcClientAbstract.java:2730) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3789) Fix TestOzoneRpcClientAbstract#testDeletedKeyForGDPR
Shashikant Banerjee created HDDS-3789: - Summary: Fix TestOzoneRpcClientAbstract#testDeletedKeyForGDPR Key: HDDS-3789 URL: https://issues.apache.org/jira/browse/HDDS-3789 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Shashikant Banerjee {code:java} [ERROR] Tests run: 67, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 36.615 s <<< FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis 3053[ERROR] testDeletedKeyForGDPR(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis) Time elapsed: 0.165 s <<< ERROR! 3054java.lang.NullPointerException 3055at org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientAbstract.testDeletedKeyForGDPR(TestOzoneRpcClientAbstract.java:2730) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2384) Large chunks during write can have memory pressure on DN with multiple clients
[ https://issues.apache.org/jira/browse/HDDS-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2384: -- Target Version/s: 0.7.0 (was: 0.6.0) > Large chunks during write can have memory pressure on DN with multiple clients > -- > > Key: HDDS-2384 > URL: https://issues.apache.org/jira/browse/HDDS-2384 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Rajesh Balamohan >Assignee: Shashikant Banerjee >Priority: Major > Labels: Triaged, performance > > During large file writes, it ends up writing {{16 MB}} chunks. > https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L691 > In large clusters, 100s of clients may connect to DN. In such cases, > depending on the incoming write workload mem load on DN can increase > significantly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3262) Fix TestOzoneRpcClientWithRatis.java
[ https://issues.apache.org/jira/browse/HDDS-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3262: - Assignee: Shashikant Banerjee > Fix TestOzoneRpcClientWithRatis.java > > > Key: HDDS-3262 > URL: https://issues.apache.org/jira/browse/HDDS-3262 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3778) Block distribution in a pipeline among open containers is not uniform
[ https://issues.apache.org/jira/browse/HDDS-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133110#comment-17133110 ] Shashikant Banerjee commented on HDDS-3778: --- The solution is to keep block allocation synchronised on the pipeline but it leads to perf degradation by 50% as indicated by Genesis benchmark for allocate block. Thanks [~nanda] for the benchmark results. > Block distribution in a pipeline among open containers is not uniform > - > > Key: HDDS-3778 > URL: https://issues.apache.org/jira/browse/HDDS-3778 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Blocker > Fix For: 0.7.0 > > Attachments: With-fully-synchronized-getMatchingContainer.png, > Without-fully-synchronized-getMatchingContainer.png > > > Currently, with concurrent allocate block calls, the block allocation among > the open containers of a pipeline is not uniform as with concurrent logic, > block allocation logic with last used notion does not hold up. The idea here > is to address this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3778) Block distribution in a pipeline among open containers is not uniform
[ https://issues.apache.org/jira/browse/HDDS-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-3778: -- Attachment: Without-fully-synchronized-getMatchingContainer.png With-fully-synchronized-getMatchingContainer.png > Block distribution in a pipeline among open containers is not uniform > - > > Key: HDDS-3778 > URL: https://issues.apache.org/jira/browse/HDDS-3778 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Blocker > Fix For: 0.7.0 > > Attachments: With-fully-synchronized-getMatchingContainer.png, > Without-fully-synchronized-getMatchingContainer.png > > > Currently, with concurrent allocate block calls, the block allocation among > the open containers of a pipeline is not uniform as with concurrent logic, > block allocation logic with last used notion does not hold up. The idea here > is to address this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3778) Block distribution in a pipeline among open containers is not uniform
[ https://issues.apache.org/jira/browse/HDDS-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3778: - Assignee: Shashikant Banerjee (was: Nanda kumar) > Block distribution in a pipeline among open containers is not uniform > - > > Key: HDDS-3778 > URL: https://issues.apache.org/jira/browse/HDDS-3778 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Blocker > Fix For: 0.7.0 > > > Currently, with concurrent allocate block calls, the block allocation among > the open containers of a pipeline is not uniform as with concurrent logic, > block allocation logic with last used notion does not hold up. The idea here > is to address this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3778) Block distribution in a pipeline among open containers is not uniform
[ https://issues.apache.org/jira/browse/HDDS-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-3778: - Assignee: Nanda kumar (was: Shashikant Banerjee) > Block distribution in a pipeline among open containers is not uniform > - > > Key: HDDS-3778 > URL: https://issues.apache.org/jira/browse/HDDS-3778 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Shashikant Banerjee >Assignee: Nanda kumar >Priority: Blocker > Fix For: 0.7.0 > > > Currently, with concurrent allocate block calls, the block allocation among > the open containers of a pipeline is not uniform as with concurrent logic, > block allocation logic with last used notion does not hold up. The idea here > is to address this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3778) Block distribution in a pipeline among open containers is not uniform
Shashikant Banerjee created HDDS-3778: - Summary: Block distribution in a pipeline among open containers is not uniform Key: HDDS-3778 URL: https://issues.apache.org/jira/browse/HDDS-3778 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.7.0 Currently, with concurrent allocate block calls, the block allocation among the open containers of a pipeline is not uniform as with concurrent logic, block allocation logic with last used notion does not hold up. The idea here is to address this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2887) Add config to tune replication level of watch requests in Ozone client
[ https://issues.apache.org/jira/browse/HDDS-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2887: -- Labels: Triaged (was: ) > Add config to tune replication level of watch requests in Ozone client > -- > > Key: HDDS-2887 > URL: https://issues.apache.org/jira/browse/HDDS-2887 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: Triaged > > Currently, while sending watch requests in ozone client, it sends watch > requests with ratis replication level set to ALL_COMMITTED and in case it > fails, it sends the request with MAJORITY_COMMITTED semantics. The idea is > to configure the replication level for watch requests so as to measure > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-770) ozonefs client warning exception logs should not be displayed on console
[ https://issues.apache.org/jira/browse/HDDS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-770: - Target Version/s: 0.7.0 (was: 0.6.0) > ozonefs client warning exception logs should not be displayed on console > > > Key: HDDS-770 > URL: https://issues.apache.org/jira/browse/HDDS-770 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Major > Labels: Triaged > > steps taken : > - > # ran ozonefs cp command - "ozone fs -cp /testdir2/2GB /testdir2/2GB_111" > # command execution was successful and file was successfully copied. > But , the warning logs/exceptions are displayed on console : > > {noformat} > [root@ctr-e138-1518143905142-53-01-03 ~]# ozone fs -cp /testdir2/2GB > /testdir2/2GB_111 > 2018-10-31 09:12:35,052 WARN scm.XceiverClientGrpc: Failed to execute command > cmdType: GetBlock > traceID: "b73d7d2d-232a-40d7-b0b6-478e3d40ed6a" > containerID: 17 > datanodeUuid: "ce0084c2-97cd-4c97-9378-e5175daad18b" > getBlock { > blockID { > containerID: 17 > localID: 100989077200109583 > } > blockCommitSequenceId: 60 > } > on datanode 9fab9937-fbcd-4196-8014-cb165045724b > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:167) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:146) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:105) > at > org.apache.hadoop.ozone.client.io.ChunkGroupInputStream.getFromOmKeyInfo(ChunkGroupInputStream.java:301) > at org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:493) > at org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:272) > at org.apache.hadoop.fs.ozone.OzoneFileSystem.open(OzoneFileSystem.java:178) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950) > at > org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:341) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262) > at org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:257) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270) > at > org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:228) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:120) > at org.apache.hadoop.fs.shell.Command.run(Command.java:177) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:327) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:390) > Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception > at > org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:420) > at > org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) > at > org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:684) > at > org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) > at > org.apache.ratis.thirdparty.io.grpc.Forwar
[jira] [Updated] (HDDS-1079) java.lang.RuntimeException: ManagedChannel allocation site exception seen on client cli when datanode restarted in one of the pipelines
[ https://issues.apache.org/jira/browse/HDDS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1079: -- Target Version/s: 0.7.0 (was: 0.6.0) > java.lang.RuntimeException: ManagedChannel allocation site exception seen on > client cli when datanode restarted in one of the pipelines > --- > > Key: HDDS-1079 > URL: https://issues.apache.org/jira/browse/HDDS-1079 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Priority: Major > Labels: TriagePending > Attachments: nodes-ozone-logs-1549879783.tar.gz > > > steps taken : > > # created 12 datanode cluster. > # started put key operation with size 100GB. > # Restarted one of the datanodes from one of the pipelines. > exception seen on cli : > > > {noformat} > [root@ctr-e139-1542663976389-62237-01-06 ~]# time ozone sh key put > volume1/bucket1/key1 /root/100G > Feb 11, 2019 9:12:49 AM > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference > cleanQueue > SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=61, > target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~* > Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() > returns true. > java.lang.RuntimeException: ManagedChannel allocation site > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44) > at > org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:116) > at > org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:54) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:60) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:191) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:59) > at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:106) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsync(GrpcClientRpc.java:69) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:324) > at > org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286) > at > org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243) > at org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259) > at > org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Feb 11, 2019 9:12:49 AM > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference > cleanQueue > SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=29, > target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~* > Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() > returns true. > java.lang.RuntimeException: ManagedChannel allocation site > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(Ma
[jira] [Updated] (HDDS-1854) Print intuitive error message at client when the pipeline returned by SCM has no datanode
[ https://issues.apache.org/jira/browse/HDDS-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1854: -- Target Version/s: 0.7.0 (was: 0.6.0) > Print intuitive error message at client when the pipeline returned by SCM has > no datanode > - > > Key: HDDS-1854 > URL: https://issues.apache.org/jira/browse/HDDS-1854 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Nanda kumar >Priority: Major > Labels: Triaged > > We are throwing {{IllegalArgumentException}} in OzoneClient when the pipeline > returned by SCM doesn't have any datanode information. Instead of throwing > {{IllegalArgumentException}}, we can throw custom user friendly exception > which is easy to understand. > Existing exception trace: > {noformat} > AssertionError: Ozone get Key failed with > output=[java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:150) > at > org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:143) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:222) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) > at > org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) > at java.base/java.io.InputStream.read(InputStream.java:205) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94) > at > org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:98) > at > org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48) > at picocli.CommandLine.execute(CommandLine.java:1173) > at picocli.CommandLine.access$800(CommandLine.java:141) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) > at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) > at > org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) > at > org.apache.hadoop.ozone.web.ozShell.OzoneShell.execute(OzoneShell.java:60) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) > at > org.apache.hadoop.ozone.web.ozShell.OzoneShell.main(OzoneShell.java:53)] > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2284) XceiverClientMetrics should be initialised as part of XceiverClientManager constructor
[ https://issues.apache.org/jira/browse/HDDS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2284: -- Target Version/s: 0.7.0 (was: 0.6.0) > XceiverClientMetrics should be initialised as part of XceiverClientManager > constructor > -- > > Key: HDDS-2284 > URL: https://issues.apache.org/jira/browse/HDDS-2284 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: TriagePending > Time Spent: 20m > Remaining Estimate: 0h > > XceiverClientMetrics is currently initialized in the read write path, the > metric should be initialized while creating XceiverClientManager -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1206) Handle Datanode volume out of space
[ https://issues.apache.org/jira/browse/HDDS-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1206: -- Target Version/s: 0.7.0 (was: 0.6.0) > Handle Datanode volume out of space > --- > > Key: HDDS-1206 > URL: https://issues.apache.org/jira/browse/HDDS-1206 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Assignee: Hanisha Koneru >Priority: Major > Labels: Triaged > > steps taken : > > # create 40 datanode cluster. > # one of the datanodes has less than 5 GB space. > # Started writing key of size 600MB. > operation failed: > Error on the client: > > {noformat} > Fri Mar 1 09:05:28 UTC 2019 Ruuning > /root/hadoop_trunk/ozone-0.4.0-SNAPSHOT/bin/ozone sh key put > testvol172275910-1551431122-1/testbuck172275910-1551431122-1/test_file24 > /root/test_files/test_file24 > original md5sum a6de00c9284708585f5a99b0490b0b23 > 2019-03-01 09:05:39,142 ERROR storage.BlockOutputStream: Unexpected Storage > Container Exception: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 79 creation failed > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-03-01 09:05:39,578 ERROR storage.BlockOutputStream: Unexpected Storage > Container Exception: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 79 creation failed > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-03-01 09:05:40,368 ERROR storage.BlockOutputStream: Unexpected Storage > Container Exception: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 79 creation failed > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-03-01 09:05:40,450 ERROR storage.BlockOutputStream: Unexpected Storage > Container Exception: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 79 creation failed > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613) > at > java.util.concurrent.CompletableFutu
[jira] [Updated] (HDDS-1286) Add more unit tests to validate exception path during write for Ozone client
[ https://issues.apache.org/jira/browse/HDDS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1286: -- Target Version/s: 0.7.0 (was: 0.6.0) > Add more unit tests to validate exception path during write for Ozone client > > > Key: HDDS-1286 > URL: https://issues.apache.org/jira/browse/HDDS-1286 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: Triaged > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-800) Avoid ByteString to byte array conversion cost by using ByteBuffer
[ https://issues.apache.org/jira/browse/HDDS-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-800: - Target Version/s: 0.7.0 (was: 0.6.0) > Avoid ByteString to byte array conversion cost by using ByteBuffer > -- > > Key: HDDS-800 > URL: https://issues.apache.org/jira/browse/HDDS-800 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: TriagePending, performance > > As noticed in HDDS-799, protobuf bytestring to byte[] array conversion has > significant performance overhead in the read and write path, This jira > proposes to use ByteBuffer in place to byte buffer to negate the performance > overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-371) Add RetriableException class in Ozone
[ https://issues.apache.org/jira/browse/HDDS-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-371. -- Fix Version/s: 0.6.0 Resolution: Won't Do > Add RetriableException class in Ozone > - > > Key: HDDS-371 > URL: https://issues.apache.org/jira/browse/HDDS-371 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: TriagePending > Fix For: 0.6.0 > > > Certain Exception thrown by a server can be because server is in a state > where request cannot be processed temporarily. > Ozone Client may retry the request. If the service is up, the server may be > able to > process a retried request. This Jira aims to introduce notion of > RetriableException in Ozone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-986) Stack overflow in TestFailureHandlingByClient
[ https://issues.apache.org/jira/browse/HDDS-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-986. -- Fix Version/s: 0.6.0 Resolution: Cannot Reproduce > Stack overflow in TestFailureHandlingByClient > - > > Key: HDDS-986 > URL: https://issues.apache.org/jira/browse/HDDS-986 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: TriagePending > Fix For: 0.6.0 > > > Stackoverflow observed with > TestFailureHandlingByClient#testMultiBlockWritesWithIntermittentDnFailures > https://builds.apache.org/job/PreCommit-HDDS-Build/2063/testReport/org.apache.hadoop.ozone.client.rpc/TestFailureHandlingByClient/testMultiBlockWritesWithIntermittentDnFailures/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1438) TestOzoneClientRetriesOnException#testGroupMismatchExceptionHandling is failing because of allocate block failures
[ https://issues.apache.org/jira/browse/HDDS-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-1438. --- Fix Version/s: 0.6.0 Resolution: Fixed > TestOzoneClientRetriesOnException#testGroupMismatchExceptionHandling is > failing because of allocate block failures > -- > > Key: HDDS-1438 > URL: https://issues.apache.org/jira/browse/HDDS-1438 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: TriagePending > Fix For: 0.6.0 > > > The test is failing with the allocate block failure assertion. > https://ci.anzix.net/job/ozone-nightly/61/testReport/org.apache.hadoop.ozone.client.rpc/TestOzoneClientRetriesOnException/testGroupMismatchExceptionHandling/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1325) Exception thrown while initializing ozoneClientAdapter
[ https://issues.apache.org/jira/browse/HDDS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1325: -- Target Version/s: 0.7.0 (was: 0.6.0) > Exception thrown while initializing ozoneClientAdapter > --- > > Key: HDDS-1325 > URL: https://issues.apache.org/jira/browse/HDDS-1325 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Priority: Major > Labels: TriagePending > > ozone version : > > > {noformat} > Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r > 568d3ab8b65d1348dec9c971feffe200e6cba2ef > Compiled by nnandi on 2019-03-19T03:54Z > Compiled with protoc 2.5.0 > From source with checksum c44d339e20094d3054754078afbf4c > Using HDDS 0.5.0-SNAPSHOT > Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r > 568d3ab8b65d1348dec9c971feffe200e6cba2ef > Compiled by nnandi on 2019-03-19T03:53Z > Compiled with protoc 2.5.0 > From source with checksum b354934fb1352f4d5425114bf8dce11 > {noformat} > > > steps taken : > --- > # Add ozone libs in hadoop classpath. > # Tried to run s3dupdo workload ([https://github.com/t3rmin4t0r/s3dupdo]) > Here is the exception thrown : > > {noformat} > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.lambda$createAdapter$1(OzoneClientAdapterFactory.java:65) > at > org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:105) > at > org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:61) > at > org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(OzoneFileSystem.java:167) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3326) > at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:532) > at org.notmysock.repl.Works$CopyWorker.run(Works.java:243) > at org.notmysock.repl.Works$CopyWorker.call(Works.java:279) > at org.notmysock.repl.Works$CopyWorker.call(Works.java:204) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.LinkageError: loader constraint violation: loader > (instance of org/apache/hadoop/fs/ozone/FilteredClassLoader) previously > initiated loading for a different type with name > "org/apache/hadoop/security/token/Token" > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.fs.ozone.FilteredClassLoader.loadClass(FilteredClassLoader.java:71) > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetPublicMethods(Class.java:2902) > at java.lang.Class.getMethods(Class.java:1615) > at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:451) > at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:339) > at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:639) > at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:557) > at java.lang.reflect.WeakCache$Factory.get(WeakCache.java:230) > at java.lang.reflect.WeakCache.get(WeakCache.java:127) > at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:419) > at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:719) > a
[jira] [Updated] (HDDS-1446) Grpc channels are leaked in XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1446: -- Target Version/s: 0.7.0 (was: 0.6.0) > Grpc channels are leaked in XceiverClientGrpc > - > > Key: HDDS-1446 > URL: https://issues.apache.org/jira/browse/HDDS-1446 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: Triaged > > Grpc Channels are leaked in MiniOzoneChaosCluster runs. > {code} > SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=522, > target=10.200.4.160:52415} was not shutdown properly!!! ~*~*~* > Make sure to call shutdown()/shutdownNow() and wait until > awaitTermination() returns true. > java.lang.RuntimeException: ManagedChannel allocation site > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) > at > org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44) > at > org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.connectToDatanode(XceiverClientGrpc.java:165) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.reconnect(XceiverClientGrpc.java:389) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandAsync(XceiverClientGrpc.java:340) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:268) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:236) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:210) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:119) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.getFromOmKeyInfo(KeyInputStream.java:302) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.createInputStream(RpcClient.java:993) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:653) > at > org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:325) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:112) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:147) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2145) Optimize client read path by reading multiple chunks along with block info in a single rpc call.
[ https://issues.apache.org/jira/browse/HDDS-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2145: -- Labels: (was: TriagePending) > Optimize client read path by reading multiple chunks along with block info in > a single rpc call. > > > Key: HDDS-2145 > URL: https://issues.apache.org/jira/browse/HDDS-2145 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client, Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Hanisha Koneru >Priority: Major > > Currently, ozone client issues a getBlock call to read the metadata info from > rocks Db on dn to get the chunkInfo and then chunk info is read one by one > inn separate rpc calls in the read path. This can be optimized by > piggybacking readChunk calls along with getBlock in a single rpc call to dn. > This Jira aims to address this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2145) Optimize client read path by reading multiple chunks along with block info in a single rpc call.
[ https://issues.apache.org/jira/browse/HDDS-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2145: -- Target Version/s: 0.7.0 (was: 0.6.0) > Optimize client read path by reading multiple chunks along with block info in > a single rpc call. > > > Key: HDDS-2145 > URL: https://issues.apache.org/jira/browse/HDDS-2145 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client, Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Hanisha Koneru >Priority: Major > Labels: TriagePending > > Currently, ozone client issues a getBlock call to read the metadata info from > rocks Db on dn to get the chunkInfo and then chunk info is read one by one > inn separate rpc calls in the read path. This can be optimized by > piggybacking readChunk calls along with getBlock in a single rpc call to dn. > This Jira aims to address this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2146) Optimize block write path performance by reducing no of watchForCommit calls
[ https://issues.apache.org/jira/browse/HDDS-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2146: -- Target Version/s: 0.7.0 (was: 0.6.0) > Optimize block write path performance by reducing no of watchForCommit calls > > > Key: HDDS-2146 > URL: https://issues.apache.org/jira/browse/HDDS-2146 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: TriagePending > > Currently, the watchForCommit calls from client to Ratis server for All > replicated semantics happens when the max buffer limit is reached which can > potentially be called 4 times as per the default configs for a single full > block write. The idea here is inspect and add optimizations to reduce the no > of watchForCommit calls. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2702) Client failed to recover from ratis AlreadyClosedException exception
[ https://issues.apache.org/jira/browse/HDDS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-2702. --- Fix Version/s: 0.6.0 Resolution: Fixed > Client failed to recover from ratis AlreadyClosedException exception > > > Key: HDDS-2702 > URL: https://issues.apache.org/jira/browse/HDDS-2702 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Sammi Chen >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: TriagePending > Fix For: 0.6.0 > > > Run teragen, and it failed to enter the mapreduce stage and print the > following warnning message on console endlessly. > {noformat} > 19/12/10 19:23:54 WARN io.KeyOutputStream: Encountered exception > java.io.IOException: Unexpected Storage Container Exception: > java.util.concurrent.CompletionException: > java.util.concurrent.CompletionException: > org.apache.ratis.protocol.AlreadyClosedException: > SlidingWindow$Client:client-FBD45C9313A5->RAFT is closed. on the pipeline > Pipeline[ Id: 90deb863-e511-4a5e-ae86-dc8035e8fa7d, Nodes: > ed90869c-317e-4303-8922-9fa83a3983cb{ip: 10.120.113.172, host: host172, > networkLocation: /rack2, certSerialId: > null}1da74a1d-f64d-4ad4-b04c-85f26687e683{ip: 10.121.124.44, host: host044, > networkLocation: /rack2, certSerialId: > null}515cab4b-39b5-4439-b1a8-a7b725f5784a{ip: 10.120.139.122, host: host122, > networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:THREE, > State:OPEN, leaderId:515cab4b-39b5-4439-b1a8-a7b725f5784a ]. The last > committed block length is 0, uncommitted data length is 295833 retry count 0 > 19/12/10 19:23:54 INFO io.BlockOutputStreamEntryPool: Allocating block with > ExcludeList {datanodes = [], containerIds = [], pipelineIds = > [PipelineID=90deb863-e511-4a5e-ae86-dc8035e8fa7d]} > 19/12/10 19:26:16 WARN io.KeyOutputStream: Encountered exception > java.io.IOException: Unexpected Storage Container Exception: > java.util.concurrent.CompletionException: > java.util.concurrent.CompletionException: > org.apache.ratis.protocol.AlreadyClosedException: > SlidingWindow$Client:client-7C5A7B5CFC31->RAFT is closed. on the pipeline > Pipeline[ Id: 90deb863-e511-4a5e-ae86-dc8035e8fa7d, Nodes: > ed90869c-317e-4303-8922-9fa83a3983cb{ip: 10.120.113.172, host: host172, > networkLocation: /rack2, certSerialId: > null}1da74a1d-f64d-4ad4-b04c-85f26687e683{ip: 10.121.124.44, host: host044, > networkLocation: /rack2, certSerialId: > null}515cab4b-39b5-4439-b1a8-a7b725f5784a{ip: 10.120.139.122, host: host122, > networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:THREE, > State:OPEN, leaderId:515cab4b-39b5-4439-b1a8-a7b725f5784a ]. The last > committed block length is 0, uncommitted data length is 295833 retry count 0 > 19/12/10 19:26:16 INFO io.BlockOutputStreamEntryPool: Allocating block with > ExcludeList {datanodes = [], containerIds = [], pipelineIds = > [PipelineID=90deb863-e511-4a5e-ae86-dc8035e8fa7d]} > 19/12/10 19:28:38 WARN io.KeyOutputStream: Encountered exception > java.io.IOException: Unexpected Storage Container Exception: > java.util.concurrent.CompletionException: > java.util.concurrent.CompletionException: > org.apache.ratis.protocol.AlreadyClosedException: > SlidingWindow$Client:client-B3D8C0052C4E->RAFT is closed. on the pipeline > Pipeline[ Id: 90deb863-e511-4a5e-ae86-dc8035e8fa7d, Nodes: > ed90869c-317e-4303-8922-9fa83a3983cb{ip: 10.120.113.172, host: host172, > networkLocation: /rack2, certSerialId: > null}1da74a1d-f64d-4ad4-b04c-85f26687e683{ip: 10.121.124.44, host: host044, > networkLocation: /rack2, certSerialId: > null}515cab4b-39b5-4439-b1a8-a7b725f5784a{ip: 10.120.139.122, host: host122, > networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:THREE, > State:OPEN, leaderId:515cab4b-39b5-4439-b1a8-a7b725f5784a ]. The last > committed block length is 0, uncommitted data length is 295833 retry count 0 > 19/12/10 19:28:38 INFO io.BlockOutputStreamEntryPool: Allocating block with > ExcludeList {datanodes = [], containerIds = [], pipelineIds = > [PipelineID=90deb863-e511-4a5e-ae86-dc8035e8fa7d]} > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3268) CommitWatcher#watchForCommit does not timeout
[ https://issues.apache.org/jira/browse/HDDS-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3268. --- Fix Version/s: 0.6.0 Resolution: Fixed > CommitWatcher#watchForCommit does not timeout > - > > Key: HDDS-3268 > URL: https://issues.apache.org/jira/browse/HDDS-3268 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: TriagePending > Fix For: 0.6.0 > > > Seems the property *ozone.client.watch.request.timeout* was removed by > HDDS-2920. Note this is a client side property to wait for the future > return. Without it, the client may wait for the future return forever in > certain cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2306) Fix TestWatchForCommit failure
[ https://issues.apache.org/jira/browse/HDDS-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-2306. --- Fix Version/s: 0.6.0 Resolution: Fixed > Fix TestWatchForCommit failure > -- > > Key: HDDS-2306 > URL: https://issues.apache.org/jira/browse/HDDS-2306 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.1 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: TriagePending > Fix For: 0.6.0 > > > {code} > [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 203.385 s <<< FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestWatchForCommit > [ERROR] > test2WayCommitForTimeoutException(org.apache.hadoop.ozone.client.rpc.TestWatchForCommit) > Time elapsed: 27.093 s <<< ERROR! > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at > org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:283) > at > org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.test2WayCommitForTimeoutException(TestWatchForCommit.java:391) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
[ https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-2332. --- Fix Version/s: 0.6.0 Resolution: Fixed > BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future > --- > > Key: HDDS-2332 > URL: https://issues.apache.org/jira/browse/HDDS-2332 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Lokesh Jain >Priority: Major > Labels: TriagePending > Fix For: 0.6.0 > > > BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that > the thread is blocked on the same condition. > {code:java} > 2019-10-18 06:30:38 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > - locked <0xa6a75930> (a > org.apache.hadoop.fs.FSDataOutputStream) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77) > - locked <0xa6a75918> (a > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > 2019-10-18 07:02:50 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.wai
[jira] [Resolved] (HDDS-2963) Use RequestDependentRetry Policy along with ExceptionDependentRetry Policy in OzoneClient
[ https://issues.apache.org/jira/browse/HDDS-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-2963. --- Fix Version/s: 0.6.0 Resolution: Fixed > Use RequestDependentRetry Policy along with ExceptionDependentRetry Policy in > OzoneClient > - > > Key: HDDS-2963 > URL: https://issues.apache.org/jira/browse/HDDS-2963 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Bharat Viswanadham >Assignee: Mukul Kumar Singh >Priority: Major > Labels: TriagePending > Fix For: 0.6.0 > > > This Jira is to use RequestDependentRetry Policy with ExceptionDependentRetry > Policy so that for different exceptions and for different kinds of requests > we can use different RetryPolicies. > Dependent on RATIS-799 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2189) Datanode should send PipelineAction on RaftServer failure
[ https://issues.apache.org/jira/browse/HDDS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2189: -- Target Version/s: 0.7.0 (was: 0.6.0) > Datanode should send PipelineAction on RaftServer failure > - > > Key: HDDS-2189 > URL: https://issues.apache.org/jira/browse/HDDS-2189 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: TriagePending > > {code:java} > 2019-09-26 08:03:07,152 ERROR > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > 664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08-SegmentedRaftLogWorker > hit exception > java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566) > at > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289) > at java.lang.Thread.run(Thread.java:748) > 2019-09-26 08:03:07,155 INFO org.apache.ratis.server.impl.RaftServerImpl: > 664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08: shutdown > {code} > On RaftServer shutdown datanode should send a PipelineAction denoting that > the pipeline has been closed exceptionally in the datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2342) ContainerStateMachine$chunkExecutor threads hold onto native memory
[ https://issues.apache.org/jira/browse/HDDS-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2342: -- Target Version/s: 0.7.0 (was: 0.6.0) > ContainerStateMachine$chunkExecutor threads hold onto native memory > --- > > Key: HDDS-2342 > URL: https://issues.apache.org/jira/browse/HDDS-2342 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: TriagePending > > In a heap dump many threads in ContainerStateMachine$chunkExecutor holds onto > native memory in the ThreadLocal map. Every such thread holds onto chunk > worth of DirectByteBuffer. Since these threads are involved in write and read > chunk operations, the JVM allocates chunk (16MB) worth of DirectByteBuffer in > the ThreadLocalMap for every thread involved in IO. Also the native memory > would not be GC'ed as long as the thread is alive. > It would be better to reduce the default number of chunk executor threads and > have them in proportion to number of disks on the datanode. We should also > use DirectByeBuffers for the IO on datanode. Currently we allocate > HeapByteBuffer which needs to be backed by DirectByteBuffer. If we can use a > DirectByteBuffer we can avoid a buffer copy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2660) Create insight point for datanode container protocol
[ https://issues.apache.org/jira/browse/HDDS-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2660: -- Target Version/s: 0.7.0 (was: 0.6.0) > Create insight point for datanode container protocol > > > Key: HDDS-2660 > URL: https://issues.apache.org/jira/browse/HDDS-2660 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > > The goal of this task is to create a new insight point for the datanode > container protocol ({{HddsDispatcher}}) to be able to debug > {{client<->datanode}} communication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3022) Datanode unable to close Pipeline after disk out of space
[ https://issues.apache.org/jira/browse/HDDS-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-3022: -- Target Version/s: 0.7.0 (was: 0.6.0) > Datanode unable to close Pipeline after disk out of space > - > > Key: HDDS-3022 > URL: https://issues.apache.org/jira/browse/HDDS-3022 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Shashikant Banerjee >Priority: Critical > Labels: TriagePending > Attachments: ozone_logs.zip > > > Datanode gets into a loop and keeps throwing errors while trying to close > pipeline > {code:java} > 2020-02-14 00:25:10,208 INFO org.apache.ratis.server.impl.RaftServerImpl: > 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07: changes role from > FOLLOWER to CANDIDATE at term 6240 for changeToCandidate > 2020-02-14 00:25:10,208 ERROR > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis: > pipeline Action CLOSE on pipeline > PipelineID=02e7e10e-2d50-4ace-a18b-701265ec9f07.Reason : > 285cac09-7622-45e6-be02-b3c68ebf8b10 is in candidate state for 31898494ms > 2020-02-14 00:25:10,208 INFO org.apache.ratis.server.impl.RoleInfo: > 285cac09-7622-45e6-be02-b3c68ebf8b10: start LeaderElection > 2020-02-14 00:25:10,223 INFO org.apache.ratis.server.impl.LeaderElection: > 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032: > begin an election at term 6241 for 0: > [d432c890-5ec4-4cf1-9078-28497a08ab85:10.65.6.227:9858, > 285cac09-7622-45e6-be02-b3c68ebf8b10:10.65.24.80:9858, > cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e:10.65.8.165:9858], old=null > 2020-02-14 00:25:10,259 INFO org.apache.ratis.server.impl.LeaderElection: > 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032 > got exception when requesting votes: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > d432c890-5ec4-4cf1-9078-28497a08ab85: group-701265EC9F07 not found. > 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection: > 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032 > got exception when requesting votes: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e: group-701265EC9F07 not found. > 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection: > 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032: > Election REJECTED; received 0 response(s) [] and 2 exception(s); > 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07:t6241, leader=null, > voted=285cac09-7622-45e6-be02-b3c68ebf8b10, > raftlog=285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-SegmentedRaftLog:OPENED:c4,f4,i14, > conf=0: [d432c890-5ec4-4cf1-9078-28497a08ab85:10.65.6.227:9858, > 285cac09-7622-45e6-be02-b3c68ebf8b10:10.65.24.80:9858, > cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e:10.65.8.165:9858], old=null > 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection: > Exception 0: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > d432c890-5ec4-4cf1-9078-28497a08ab85: group-701265EC9F07 not found. > 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection: > Exception 1: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: > cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e: group-701265EC9F07 not found. > 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RaftServerImpl: > 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07: changes role from > CANDIDATE to FOLLOWER at term 6241 for DISCOVERED_A_NEW_TERM > 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RoleInfo: > 285cac09-7622-45e6-be02-b3c68ebf8b10: shutdown LeaderElection > 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RoleInfo: > 285cac09-7622-45e6-be02-b3c68ebf8b10: start FollowerState > 2020-02-14 00:25:10,680 WARN org.apache.ratis.grpc.server.GrpcLogAppender: > 285cac09-7622-45e6-be02-b3c68ebf8b10@group-DD847EC75388->d432c890-5ec4-4cf1-9078-28497a08ab85-GrpcLogAppender: > HEARTBEAT appendEntries Timeout, > request=AppendEntriesRequest:cid=12669,entriesCount=0,lastEntry=null > 2020-02-14 00:25:10,752 ERROR > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis: > pipeline Action CLOSE on pipeline > PipelineID=7ad5ce51-d3fa-4e71-99f2-dd847ec75388.Reason : > 285cac09-7622-45e6-be02-b3c68ebf8b10 has not seen follower/s > d432c890-5ec4-4cf1
[jira] [Updated] (HDDS-2476) Share more code between metadata and data scanners
[ https://issues.apache.org/jira/browse/HDDS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2476: -- Target Version/s: 0.7.0 (was: 0.6.0) > Share more code between metadata and data scanners > -- > > Key: HDDS-2476 > URL: https://issues.apache.org/jira/browse/HDDS-2476 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Reporter: Attila Doroszlai >Assignee: YiSheng Lien >Priority: Major > > There are several duplicated / similar pieces of code in metadata and data > scanners. More code should be reused. > Examples: > # ContainerDataScrubberMetrics and ContainerMetadataScrubberMetrics have 3 > common metrics > # lifecycle of ContainerMetadataScanner and ContainerDataScanner (main loop, > iteration, metrics processing, shutdown) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3163) write Key is hung when write delay is injected in datanode dir
[ https://issues.apache.org/jira/browse/HDDS-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3163. --- Fix Version/s: 0.6.0 Resolution: Fixed > write Key is hung when write delay is injected in datanode dir > -- > > Key: HDDS-3163 > URL: https://issues.apache.org/jira/browse/HDDS-3163 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Nilotpal Nandi >Assignee: Lokesh Jain >Priority: Major > Labels: TriagePending, fault_injection > Fix For: 0.6.0 > > > steps taken : > - > 1. Mounted noise injection FUSE on all datanodes. > 2. Select one datanode from each open pipeline > 3. Inject delay of 120 seconds on chunk file path of selected datanodes > 4. Start PUT key operation. > PUT Key operation is stuck and does not return any success/error . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3498) Address already in use Should shutdown the datanode with FATAL log and point out the port and configure key
[ https://issues.apache.org/jira/browse/HDDS-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-3498: -- Target Version/s: 0.7.0 (was: 0.6.0) > Address already in use Should shutdown the datanode with FATAL log and point > out the port and configure key > --- > > Key: HDDS-3498 > URL: https://issues.apache.org/jira/browse/HDDS-3498 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.6.0 >Reporter: maobaolong >Priority: Minor > Labels: Triaged > > Now, the datanode process cannot work because the port is in use, but the > process still live. > Further more, i guess the in use port is 9861, but it isn't, after look the > source code, i find it is the `dfs.container.ipc`, default port is 9859, this > port should appear with the following exception. I think this error should be > in FATAL level, and we can terminate the datanode process. > {code:java} > 2020-04-21 15:53:05,436 [Datanode State Machine Thread - 0] WARN > org.apache.hadoop.ozone.container.common.statemachine.EndpointStateMachine: > Unable to communicate to SCM server at 127.0.0.1:9861 for past 300 seconds. > java.io.IOException: Failed to bind > at > org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:246) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:184) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:90) > at > org.apache.hadoop.ozone.container.common.transport.server.XceiverServerGrpc.start(XceiverServerGrpc.java:141) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:235) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:113) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:132) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:551) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1345) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:503) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:488) > at > org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:984) > at > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:247) > at > org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:355) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) > at > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at > org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > ... 1 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsu
[jira] [Updated] (HDDS-2696) Document recovery from RATIS-677
[ https://issues.apache.org/jira/browse/HDDS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2696: -- Target Version/s: 0.7.0 (was: 0.6.0) > Document recovery from RATIS-677 > > > Key: HDDS-2696 > URL: https://issues.apache.org/jira/browse/HDDS-2696 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Istvan Fajth >Priority: Critical > Labels: Triaged > > As RATIS-677 is solved in a way where a setting needs to be changed, and set > for the RatisServer implementation to ignore the corruption, and at the > moment due to HDDS-2647, we do not have a clear recovery path from a ratis > corruption in the pipeline data. > We should document how this can be recovered. I have an idea which includes > closing the pipeline in SCM and remove the ratis metadata for the pipeline in > the DataNodes, which effectively clears out the corrupted pipeline from the > system. > There are two problems I have with finding a recovery path, and document it: > - I am not sure if we have strong enough guarantees that the writes happened > properly if the ratis metadata could become corrupt so this needs to be > investigated. > - At the moment I can not validate this approach, as if I do the steps (stop > the 3 DN, move out ratis data for pipeline, close the pipeline with scmcli, > then restart the DNs) the pipeline is not closed properly, and SCM fails as > described in HDDS-2695 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3514) Fix Memory leak of RaftServerImpl
[ https://issues.apache.org/jira/browse/HDDS-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-3514. --- Fix Version/s: 0.6.0 Resolution: Fixed > Fix Memory leak of RaftServerImpl > - > > Key: HDDS-3514 > URL: https://issues.apache.org/jira/browse/HDDS-3514 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Labels: Triaged, pull-request-available > Fix For: 0.6.0 > > > This depends on [RATIS-845|https://issues.apache.org/jira/browse/RATIS-845], > find the details in RATIS-845. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2701) Avoid read from temporary chunk file in datanode
[ https://issues.apache.org/jira/browse/HDDS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2701: -- Target Version/s: 0.7.0 (was: 0.6.0) > Avoid read from temporary chunk file in datanode > > > Key: HDDS-2701 > URL: https://issues.apache.org/jira/browse/HDDS-2701 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: TriagePending > > Currently we try reading chunk data from the temp file if chunk file does not > exist. The fix was added in HDDS-2372 due to race condition between > readStateMachineData and writeStateMachineData in ContainerStateMachine. > After HDDS-2542 is fixed the read from the temp file can be avoided by making > sure that chunk data remains in cache until the chunk file is generated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org