[jira] [Updated] (HDDS-4400) Make raft log directory deletion configurable during pipeline remove

2020-10-28 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-4400:
--
Description: The idea here is to add a config to make raft log directory 
removal configurable during pipeline remove.  (was: The idea here is to add a 
hidden config to make raft log directory removal configurable during pipeline 
remove.)

> Make raft log directory deletion configurable during pipeline remove
> 
>
> Key: HDDS-4400
> URL: https://issues.apache.org/jira/browse/HDDS-4400
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 1.1.0
>
>
> The idea here is to add a config to make raft log directory removal 
> configurable during pipeline remove.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4400) Make raft log directory deletion configurable during pipeline remove

2020-10-28 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-4400:
-

 Summary: Make raft log directory deletion configurable during 
pipeline remove
 Key: HDDS-4400
 URL: https://issues.apache.org/jira/browse/HDDS-4400
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 1.1.0


The idea here is to add a hidden config to make raft log directory removal 
configurable during pipeline remove.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4399) Safe mode rule for piplelines should only consider open pipelines

2020-10-28 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-4399:
-

 Summary: Safe mode rule for piplelines should only consider open 
pipelines
 Key: HDDS-4399
 URL: https://issues.apache.org/jira/browse/HDDS-4399
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 1.1.0


Currently, for safe mode we consider all pipelines existing in DB for safe mode 
exit criteria. It ma happen that, SCM has the pipelines craeted , but none of 
the participants datanodes ever created these datanodes. In such cases, SCM 
fails to come out of safemode as these pipelines are never reported back to SCM.

 

The idea here is to consider pipelines which are marked open during SCM startup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3700) Number of open containers per pipeline should be tuned as per the number of disks on datanode

2020-10-28 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3700.
---
Fix Version/s: 1.1.0
   Resolution: Fixed

> Number of open containers per pipeline should be tuned as per the number of 
> disks on datanode
> -
>
> Key: HDDS-3700
> URL: https://issues.apache.org/jira/browse/HDDS-3700
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 1.1.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: Performance
> Fix For: 1.1.0
>
> Attachments: Load Distribution Across disks in Ozone.pdf, Screenshot 
> 2020-06-02 at 12.44.14 PM.png
>
>
> Currently, "ozone.scm.pipeline.owner.container.count" is configured by 
> default to 3. The default should ideally be a function of the no of disks on 
> a datanode. A static value may lead to uneven utilisation during active IO .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4388) Make writeStateMachineTimeout retry count proportional to node failure timeout

2020-10-23 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-4388:
--
Description: Currently, in ratis "writeStateMachinecall" gets retried 
indefinitely in event of a timeout. In case, where disks are slow/overloaded or 
number of chunk writer threads are not available for a period of 10s, 
writeStateMachine call times out in 10s. In cases like these, the same write 
chunk keeps on getting retried causing the same chunk of data to be 
overwritten. The idea here is to abort the request once the node failure 
timeout reaches.  (was: Currently, in ratis "writeStateMachinecall" gets 
retried indefinitely in event of a timeout. In case, where disks are 
slow/overloaded or number of chunk writer threads are not available for a 
period of 10s, writeStateMachine call times out in 10s. In cases like these, 
the same write chunk keeps on getting retried causing the same chink of data to 
be overwritten. The idea here is to abort the request once the node failure 
timeout reaches.)

> Make writeStateMachineTimeout retry count proportional to node failure timeout
> --
>
> Key: HDDS-4388
> URL: https://issues.apache.org/jira/browse/HDDS-4388
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.1.0
>
>
> Currently, in ratis "writeStateMachinecall" gets retried indefinitely in 
> event of a timeout. In case, where disks are slow/overloaded or number of 
> chunk writer threads are not available for a period of 10s, writeStateMachine 
> call times out in 10s. In cases like these, the same write chunk keeps on 
> getting retried causing the same chunk of data to be overwritten. The idea 
> here is to abort the request once the node failure timeout reaches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4388) Make writeStateMachineTimeout retry count proportional to node failure timeout

2020-10-23 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-4388:
-

 Summary: Make writeStateMachineTimeout retry count proportional to 
node failure timeout
 Key: HDDS-4388
 URL: https://issues.apache.org/jira/browse/HDDS-4388
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 1.1.0


Currently, in ratis "writeStateMachinecall" gets retried indefinitely in event 
of a timeout. In case, where disks are slow/overloaded or number of chunk 
writer threads are not available for a period of 10s, writeStateMachine call 
times out in 10s. In cases like these, the same write chunk keeps on getting 
retried causing the same chink of data to be overwritten. The idea here is to 
abort the request once the node failure timeout reaches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3103) Have multi-raft pipeline calculator to recommend best pipeline number per datanode

2020-10-13 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213047#comment-17213047
 ] 

Shashikant Banerjee commented on HDDS-3103:
---

This should have been addresed with HDDS-3700.

> Have multi-raft pipeline calculator to recommend best pipeline number per 
> datanode
> --
>
> Key: HDDS-3103
> URL: https://issues.apache.org/jira/browse/HDDS-3103
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Priority: Critical
>
> PipelinePlacementPolicy should have a calculator method to recommend better 
> number for pipeline number per node. The number used to come from 
> ozone.datanode.pipeline.limit in config. SCM should be able to consider how 
> many ratis dir and the ratis retry timeout to recommend the best pipeline 
> number for every node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4335) No user access checks in Ozone FS

2020-10-12 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-4335:
-

 Summary: No user access checks in Ozone FS
 Key: HDDS-4335
 URL: https://issues.apache.org/jira/browse/HDDS-4335
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Shashikant Banerjee


Currently, a dir/file created with hdfs user cab be deleted by any user.
{code:java}
[sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ sudo -u hdfs ozone fs 
-mkdir o3fs://bucket1.vol1.ozone1/data/sandbox/poc/teragen
[sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ sudo -u hdfs ozone fs 
-ls o3fs://bucket1.vol1.ozone1/data/sandbox/poc/teragen
[sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ sudo -u hdfs ozone fs 
-ls o3fs://bucket1.vol1.ozone1/data/sandbox/poc/
Found 1 items
drwxrwxrwx   - hdfs hdfs          0 2020-10-12 02:51 
o3fs://bucket1.vol1.ozone1/data/sandbox/poc/teragen
[sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ 
[sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ 
[sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ 
[sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ ozone fs -rm -r 
o3fs://bucket1.vol1.ozone1/data/sandbox/poc/
20/10/12 02:52:16 INFO Configuration.deprecation: io.bytes.per.checksum is 
deprecated. Instead, use dfs.bytes-per-checksum
20/10/12 02:52:16 INFO ozone.BasicOzoneFileSystem: Move to trash is disabled 
for o3fs, deleting instead: o3fs://bucket1.vol1.ozone1/data/sandbox/poc. Files 
or directories will NOT be retained in trash. Ignore the following 
TrashPolicyDefault message, if any.
20/10/12 02:52:16 INFO fs.TrashPolicyDefault: Moved: 
'o3fs://bucket1.vol1.ozone1/data/sandbox/poc' to trash at: 
/.Trash/sbanerjee/Current/data/sandbox/poc1602496336480
[sbanerjee@vd1308 MapReduce-Performance_Testing-master]$ sudo -u hdfs ozone fs 
-ls o3fs://bucket1.vol1.ozone1/data/sandbox/poc/
ls: `o3fs://bucket1.vol1.ozone1/data/sandbox/poc/': No such file or directory
{code}
Whereas, the same seuquence fails with permission denied error in HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-4318) Disable single node pipeline creation by default in Ozone

2020-10-07 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-4318:
-

Assignee: Aryan Gupta

> Disable single node pipeline creation by default in Ozone
> -
>
> Key: HDDS-4318
> URL: https://issues.apache.org/jira/browse/HDDS-4318
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Shashikant Banerjee
>Assignee: Aryan Gupta
>Priority: Major
>
> Currently, single node pipeline creation is ON by default in ozone, though 
> its not used by default in Ozone write path. It would be good to disable this 
> by turning off the config "ozone.scm.pipeline.creation.auto.factor.one" by 
> default. It may lead to some unit test failures and for those tests , this 
> config needs to b explicitly set to true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4318) Disable single node pipeline creation by default in Ozone

2020-10-07 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-4318:
-

 Summary: Disable single node pipeline creation by default in Ozone
 Key: HDDS-4318
 URL: https://issues.apache.org/jira/browse/HDDS-4318
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Shashikant Banerjee


Currently, single node pipeline creation is ON by default in ozone, though its 
not used by default in Ozone write path. It would be good to disable this by 
turning off the config "ozone.scm.pipeline.creation.auto.factor.one" by 
default. It may lead to some unit test failures and for those tests , this 
config needs to b explicitly set to true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3811) Add tests to verify all the disks of a datanode are utilized for write

2020-10-07 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3811:
-

Assignee: (was: Shashikant Banerjee)

> Add tests to verify all the disks of a datanode are utilized for write 
> ---
>
> Key: HDDS-3811
> URL: https://issues.apache.org/jira/browse/HDDS-3811
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Priority: Major
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3811) Add tests to verify all the disks of a datanode are utilized for write

2020-10-07 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3811.
---
Fix Version/s: 1.1.0
   Resolution: Fixed

> Add tests to verify all the disks of a datanode are utilized for write 
> ---
>
> Key: HDDS-3811
> URL: https://issues.apache.org/jira/browse/HDDS-3811
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4072) Pipeline creation on a datanodes should account for raft log disks reported

2020-10-07 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-4072.
---
Fix Version/s: 1.1.0
   Resolution: Fixed

> Pipeline creation on a datanodes should account for raft log disks reported
> ---
>
> Key: HDDS-4072
> URL: https://issues.apache.org/jira/browse/HDDS-4072
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 1.1.0
>
>
> Currently, how many pipelines a datanode will be a part is controlled by a 
> config ozone.datanode.pipeline.limit. Now, with the no of raft log disks 
> reported by datanode to SCM, we can potentially set the limit on pipeline 
> capacity based on raft log disks reported instead .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4298) Use an interface in Ozone client instead of XceiverClientManager

2020-10-05 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-4298.
---
Fix Version/s: 1.1.0
   Resolution: Fixed

> Use an interface in Ozone client instead of XceiverClientManager
> 
>
> Key: HDDS-4298
> URL: https://issues.apache.org/jira/browse/HDDS-4298
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.1.0
>
>
> XceiverClientManager is used everywhere in the ozone client (Key/Block 
> Input/OutputStream) to get a client when required.
> To make it easier to create genesis/real unit tests, it would be better to 
> use a generic interface instead of XceiverClientManager which can make it 
> easy to replace the manager with a mock implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3297) TestOzoneClientKeyGenerator is flaky

2020-09-24 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3297.
---
Fix Version/s: 1.1.0
 Assignee: Aryan Gupta
   Resolution: Fixed

> TestOzoneClientKeyGenerator is flaky
> 
>
> Key: HDDS-3297
> URL: https://issues.apache.org/jira/browse/HDDS-3297
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Reporter: Marton Elek
>Assignee: Aryan Gupta
>Priority: Critical
>  Labels: TriagePending, flaky-test, ozone-flaky-test, 
> pull-request-available
> Fix For: 1.1.0
>
> Attachments: 
> org.apache.hadoop.ozone.freon.TestOzoneClientKeyGenerator-output.txt
>
>
> Sometimes it's hanging and stopped after a timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4218) Remove test TestRatisManager

2020-09-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-4218.
---
Fix Version/s: 1.1.0
   Resolution: Fixed

> Remove test TestRatisManager
> 
>
> Key: HDDS-4218
> URL: https://issues.apache.org/jira/browse/HDDS-4218
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.1.0
>
>
> Delete this test as RatisManager is no longer present and this test has been 
> disabled for a long time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4217) Remove test TestOzoneContainerRatis

2020-09-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-4217.
---
Fix Version/s: 1.1.0
   Resolution: Fixed

> Remove test TestOzoneContainerRatis
> ---
>
> Key: HDDS-4217
> URL: https://issues.apache.org/jira/browse/HDDS-4217
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.1.0
>
>
> Delete TestOzoneContainerRatis as it has been disabled for a long time and is 
> no longer relevant.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3151) Intermittent timeout in TestCloseContainerHandlingByClient#testMultiBlockWrites3#testDiscardPreallocatedBlocks

2020-09-07 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3151.
---
Fix Version/s: 1.1.0
   Resolution: Fixed

> Intermittent timeout in 
> TestCloseContainerHandlingByClient#testMultiBlockWrites3#testDiscardPreallocatedBlocks
> --
>
> Key: HDDS-3151
> URL: https://issues.apache.org/jira/browse/HDDS-3151
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Aryan Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.1.0
>
> Attachments: 
> org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient-output.txt,
>  org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.txt
>
>
> {code:title=https://github.com/apache/hadoop-ozone/runs/495906854}
> Tests run: 8, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 210.963 s <<< 
> FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient
> testMultiBlockWrites3(org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient)
>   Time elapsed: 108.777 s  <<< ERROR!
> java.util.concurrent.TimeoutException:
> ...
>   at 
> org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:251)
>   at 
> org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:151)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.waitForContainerClose(TestCloseContainerHandlingByClient.java:342)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.testMultiBlockWrites3(TestCloseContainerHandlingByClient.java:310)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3762) Intermittent failure in TestDeleteWithSlowFollower

2020-09-02 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3762.
---
Fix Version/s: 1.1.0
   Resolution: Fixed

> Intermittent failure in TestDeleteWithSlowFollower
> --
>
> Key: HDDS-3762
> URL: https://issues.apache.org/jira/browse/HDDS-3762
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 1.0.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.1.0
>
>
> TestDeleteWithSlowFollower failed soon after it was re-enabled in HDDS-3330.
> {code:title=https://github.com/apache/hadoop-ozone/runs/753363338}
> [INFO] Running org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 28.647 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower
> [ERROR] 
> testDeleteKeyWithSlowFollower(org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower)
>   Time elapsed: 0.163 s  <<< FAILURE!
> java.lang.AssertionError
>   ...
>   at org.junit.Assert.assertNotNull(Assert.java:631)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower.testDeleteKeyWithSlowFollower(TestDeleteWithSlowFollower.java:225)
> {code}
> CC [~shashikant] [~elek]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4077) Incomplete OzoneFileSystem statistics

2020-08-30 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-4077:
--
Fix Version/s: 1.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Incomplete OzoneFileSystem statistics
> -
>
> Key: HDDS-4077
> URL: https://issues.apache.org/jira/browse/HDDS-4077
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.1
>
>
> OzoneFileSystem does not record some of the operations that are defined in 
> [Statistic|https://github.com/apache/hadoop-ozone/blob/d7ea4966656cfdb0b53a368eac52d71adb717104/hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/Statistic.java#L44-L75].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4149) Implement OzoneFileStatus#toString

2020-08-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-4149:
--
Fix Version/s: 0.6.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Implement OzoneFileStatus#toString
> --
>
> Key: HDDS-4149
> URL: https://issues.apache.org/jira/browse/HDDS-4149
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Filesystem
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> {{OzoneFileStatus}} should implement {{toString}} for debug purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4048) Show more information while SCM version info mismatch

2020-08-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-4048.
---
Fix Version/s: 0.7.0
   Resolution: Fixed

> Show more information while SCM version info mismatch
> -
>
> Key: HDDS-4048
> URL: https://issues.apache.org/jira/browse/HDDS-4048
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Affects Versions: 0.6.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4078) Use HDDS InterfaceAudience/Stability annotations

2020-08-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-4078:
--
Fix Version/s: 0.7.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Use HDDS InterfaceAudience/Stability annotations
> 
>
> Key: HDDS-4078
> URL: https://issues.apache.org/jira/browse/HDDS-4078
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.7.0
>
>
> HDDS-3028 added Ozone-private versions of {{InterfaceAudience}} and 
> {{InterfaceStability}} annotations.  Some recent changes re-introduced usage 
> of their Hadoop Common versions.
> {code}
> hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/CleanupTableInfo.java
> 19:import org.apache.hadoop.classification.InterfaceStability;
> hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneClientAdapterImpl.java
> 28:import org.apache.hadoop.classification.InterfaceAudience;
> hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicRootedOzoneFileSystem.java
> 21:import org.apache.hadoop.classification.InterfaceAudience;
> 22:import org.apache.hadoop.classification.InterfaceStability;
> hadoop-ozone/ozonefs-common/src/main/java/org/apache/hadoop/fs/ozone/BasicRootedOzoneClientAdapterImpl.java
> 33:import org.apache.hadoop.classification.InterfaceAudience;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4034) Add Unit Test for HadoopNestedDirGenerator

2020-08-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-4034.
---
Fix Version/s: 0.7.0
   Resolution: Fixed

> Add Unit Test for HadoopNestedDirGenerator
> --
>
> Key: HDDS-4034
> URL: https://issues.apache.org/jira/browse/HDDS-4034
> Project: Hadoop Distributed Data Store
>  Issue Type: Test
>Reporter: Aryan Gupta
>Assignee: Aryan Gupta
>Priority: Major
>  Labels: https://github.com/apache/hadoop-ozone/pull/1266, 
> pull-request-available
> Fix For: 0.7.0
>
>
> Unit test - It checks the span and depth of nested directories created by the 
> HadoopNestedDirGenerator Tool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4072) Pipeline creation on a datanodes should account for raft log disks reported

2020-08-05 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-4072:
-

 Summary: Pipeline creation on a datanodes should account for raft 
log disks reported
 Key: HDDS-4072
 URL: https://issues.apache.org/jira/browse/HDDS-4072
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: Ozone Datanode, SCM
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


Currently, how many pipelines a datanode will be a part is controlled by a 
config ozone.datanode.pipeline.limit. Now, with the no of raft log disks 
reported by datanode to SCM, we can potentially set the limit on pipeline 
capacity based on raft log disks reported instead .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3810) Add the logic to distribute open containers among the pipelines of a datanode

2020-07-30 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3810:
--
Summary: Add the logic to distribute open containers among the pipelines of 
a datanode  (was: Add the logic to distribute open containers among the 
piplelines of a datanode)

> Add the logic to distribute open containers among the pipelines of a datanode
> -
>
> Key: HDDS-3810
> URL: https://issues.apache.org/jira/browse/HDDS-3810
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> A datanode can participate in multiple pipelines based on no of raft log 
> disks as well the disk type. SCM should make the distribution of open 
> containers among these set of pipelines evenly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3963) Remove XceiverServerSpi interface

2020-07-15 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-3963:
-

 Summary: Remove XceiverServerSpi interface
 Key: HDDS-3963
 URL: https://issues.apache.org/jira/browse/HDDS-3963
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Shashikant Banerjee
 Fix For: 0.7.0


As per [~elek]  here 
[https://github.com/apache/hadoop-ozone/pull/1107#discussion_r447545366:]
{code:java}
It seems to be a good time the remove XceiverServerSpi interface. Originally we 
had two separated implementation to connect to the datanode. Today we have only 
one. One interface is used between the client and the datanode, and the other 
one between datanode and ratis (datanode). As this example shows, the two 
interface shouldn't be the same.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3861) Fix handlePipelineFailure throw exception if role is follower

2020-07-13 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3861.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix handlePipelineFailure throw exception if role is follower
> -
>
> Key: HDDS-3861
> URL: https://issues.apache.org/jira/browse/HDDS-3861
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3808) Ensure volume info on a datanode is propagate to SCM

2020-06-25 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3808.
---
Fix Version/s: 0.6.0
   Resolution: Implemented

> Ensure volume info on a datanode is propagate to SCM
> 
>
> Key: HDDS-3808
> URL: https://issues.apache.org/jira/browse/HDDS-3808
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.6.0
>
>
> The aim here is to verify if volume level info of datanode is propaged to 
> datanode and if not, add the support here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3864) Add a tool to dispaly containers on a datanode

2020-06-24 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3864:
-

Assignee: Sadanand Shenoy  (was: Shashikant Banerjee)

> Add a tool to dispaly containers on a datanode
> --
>
> Key: HDDS-3864
> URL: https://issues.apache.org/jira/browse/HDDS-3864
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Sadanand Shenoy
>Priority: Major
>
> The idea here is to add a utility to dump containers displaying containerIDs 
> and BCSID on a datanode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3864) Add a tool to dispaly containers on a datanode

2020-06-24 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3864:
--
Description: The idea here is to add a utility to dump containers 
displaying containerIDs and BCSID on a datanode.   (was: The idea here is to 
add a utility to dump containers displying containerIDs and BCSID on a 
datanode. )

> Add a tool to dispaly containers on a datanode
> --
>
> Key: HDDS-3864
> URL: https://issues.apache.org/jira/browse/HDDS-3864
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> The idea here is to add a utility to dump containers displaying containerIDs 
> and BCSID on a datanode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3864) Add a tool to dispaly containers on a datanode

2020-06-24 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-3864:
-

 Summary: Add a tool to dispaly containers on a datanode
 Key: HDDS-3864
 URL: https://issues.apache.org/jira/browse/HDDS-3864
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


The idea here is to add a utility to dump containers displying containerIDs and 
BCSID on a datanode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3263) Fix TestCloseContainerByPipeline.java

2020-06-24 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3263:
-

Assignee: Shashikant Banerjee

> Fix TestCloseContainerByPipeline.java
> -
>
> Key: HDDS-3263
> URL: https://issues.apache.org/jira/browse/HDDS-3263
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3422) Enable TestCloseContainerHandlingByClient test cases

2020-06-24 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3422:
-

Assignee: Shashikant Banerjee  (was: Prashant Pogde)

> Enable TestCloseContainerHandlingByClient test cases
> 
>
> Key: HDDS-3422
> URL: https://issues.apache.org/jira/browse/HDDS-3422
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Nanda kumar
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.6.0
>
>
> Fix and enable TestCloseContainerHandlingByClient test cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3424) Enable TestContainerStateMachineFailures test cases

2020-06-24 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3424:
-

Assignee: Shashikant Banerjee  (was: Prashant Pogde)

> Enable TestContainerStateMachineFailures test cases
> ---
>
> Key: HDDS-3424
> URL: https://issues.apache.org/jira/browse/HDDS-3424
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Nanda kumar
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.6.0
>
>
> Fix and enable TestContainerStateMachineFailures test cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3853) Container marked as missing on datanode while container directory do exist

2020-06-24 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3853:
-

Assignee: Shashikant Banerjee

> Container marked as missing on datanode while container directory do exist
> --
>
> Key: HDDS-3853
> URL: https://issues.apache.org/jira/browse/HDDS-3853
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Shashikant Banerjee
>Priority: Major
>
> INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: 
> PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , 
> Message: ContainerID 1744 has been lost and and cannot be recreated on this 
> DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 1744 has been lost and and cannot be recreated on this DataNode
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
>  gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex 
> 40079 msg : ContainerID 1744 has been lost and and cannot be recreated on 
> this DataNode Container Result: CONTAINER_MISSING
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction 
> failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER 
> .Triggering pipeline close action
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3853) Container marked as missing on datanode while container directory do exist

2020-06-23 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3853:
-

Assignee: (was: Shashikant Banerjee)

> Container marked as missing on datanode while container directory do exist
> --
>
> Key: HDDS-3853
> URL: https://issues.apache.org/jira/browse/HDDS-3853
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Priority: Major
>
> INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: 
> PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , 
> Message: ContainerID 1744 has been lost and and cannot be recreated on this 
> DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 1744 has been lost and and cannot be recreated on this DataNode
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
>  gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex 
> 40079 msg : ContainerID 1744 has been lost and and cannot be recreated on 
> this DataNode Container Result: CONTAINER_MISSING
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction 
> failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER 
> .Triggering pipeline close action
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3853) Container marked as missing on datanode while container directory do exist

2020-06-23 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3853:
-

Assignee: Shashikant Banerjee

> Container marked as missing on datanode while container directory do exist
> --
>
> Key: HDDS-3853
> URL: https://issues.apache.org/jira/browse/HDDS-3853
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Shashikant Banerjee
>Priority: Major
>
> INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: 
> PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , 
> Message: ContainerID 1744 has been lost and and cannot be recreated on this 
> DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 1744 has been lost and and cannot be recreated on this DataNode
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
>  gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex 
> 40079 msg : ContainerID 1744 has been lost and and cannot be recreated on 
> this DataNode Container Result: CONTAINER_MISSING
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction 
> failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER 
> .Triggering pipeline close action
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3430) Enable TestWatchForCommit test cases

2020-06-23 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3430:
-

Assignee: Shashikant Banerjee

> Enable TestWatchForCommit test cases
> 
>
> Key: HDDS-3430
> URL: https://issues.apache.org/jira/browse/HDDS-3430
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Nanda kumar
>Assignee: Shashikant Banerjee
>Priority: Major
>
> Fix and enable TestWatchForCommit test cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3424) Enable TestContainerStateMachineFailures test cases

2020-06-23 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3424.
---
Fix Version/s: 0.6.0
   Resolution: Duplicate

> Enable TestContainerStateMachineFailures test cases
> ---
>
> Key: HDDS-3424
> URL: https://issues.apache.org/jira/browse/HDDS-3424
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Nanda kumar
>Assignee: Prashant Pogde
>Priority: Major
> Fix For: 0.6.0
>
>
> Fix and enable TestContainerStateMachineFailures test cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3422) Enable TestCloseContainerHandlingByClient test cases

2020-06-23 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3422.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

Tests have been enabled with https://issues.apache.org/jira/browse/HDDS-2936.

> Enable TestCloseContainerHandlingByClient test cases
> 
>
> Key: HDDS-3422
> URL: https://issues.apache.org/jira/browse/HDDS-3422
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Nanda kumar
>Assignee: Prashant Pogde
>Priority: Major
> Fix For: 0.6.0
>
>
> Fix and enable TestCloseContainerHandlingByClient test cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3802) Incorrect data returned by reading a FILE_PER_CHUNK block

2020-06-18 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3802.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Incorrect data returned by reading a FILE_PER_CHUNK block
> -
>
> Key: HDDS-3802
> URL: https://issues.apache.org/jira/browse/HDDS-3802
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> A summary of s3 big file download result with Apri 22th master branch code,
> 1. download with aws s3 sdk, md5 sum is different
> 2. download with "ozone fs -get o3fs://",  md5 sum is different
> 3. download with "ozone sh key get", md5 sum is the same as the local file
> So it seems the issue is from the read part.  And the md5sum result of step 
> 1. and step 2. are also different from each other. (edited) 
> The difference behaviors are caused by different read buffer size of 
> different interface. If the read buffer size equals to chunk size, then fine. 
> If the read buffer size is smaller than chunk size, then content returned is 
> incorrent, because datanode side read ignore the offset in request, use 0 as 
> offset to read the data.
> FilePerChunkStrategy#readChunk 
> {code:java}
> // use offset only if file written by old datanode
> long offset;
> if (file.exists() && file.length() == info.getOffset() + len) {
>   offset = info.getOffset();
> } else {
>   offset = 0;   ---> this line causes the trouble. 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3262) Fix TestOzoneRpcClientWithRatis.java

2020-06-16 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3262.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix TestOzoneRpcClientWithRatis.java
> 
>
> Key: HDDS-3262
> URL: https://issues.apache.org/jira/browse/HDDS-3262
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3811) Add tests to verify all the disks of a datanode are utilized for write

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3811:
-

Assignee: Shashikant Banerjee

> Add tests to verify all the disks of a datanode are utilized for write 
> ---
>
> Key: HDDS-3811
> URL: https://issues.apache.org/jira/browse/HDDS-3811
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3808) Ensure volume info on a datanode is propagate to SCM

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3808:
-

Assignee: Shashikant Banerjee

> Ensure volume info on a datanode is propagate to SCM
> 
>
> Key: HDDS-3808
> URL: https://issues.apache.org/jira/browse/HDDS-3808
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.6.0
>
>
> The aim here is to verify if volume level info of datanode is propaged to 
> datanode and if not, add the support here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3809) Make number of open containers on a datanode a function of no of volumes reported by it

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3809:
--
Fix Version/s: 0.6.0

> Make number of open containers on a datanode a function of no of volumes 
> reported by it
> ---
>
> Key: HDDS-3809
> URL: https://issues.apache.org/jira/browse/HDDS-3809
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.6.0
>
>
> The no of open containers on a datanode is to be driven by factor of no of 
> data disks available multiplied by no of open containers per disk. The aim 
> here is to add the logic here and verify it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3810) Add the logic to distribute open containers among the piplelines of a datanode

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3810:
-

Assignee: Shashikant Banerjee

> Add the logic to distribute open containers among the piplelines of a datanode
> --
>
> Key: HDDS-3810
> URL: https://issues.apache.org/jira/browse/HDDS-3810
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> A datanode can participate in multiple pipelines based on no of raft log 
> disks as well the disk type. SCM should make the distribution of open 
> containers among these set of pipelines evenly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3807) Propagate raft log disks info to SCM from datanode

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3807:
-

Assignee: Shashikant Banerjee

> Propagate raft log disks info to SCM from datanode
> --
>
> Key: HDDS-3807
> URL: https://issues.apache.org/jira/browse/HDDS-3807
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> No of pipelines to be created on a datanode is to be driven by the no of raft 
> log disks configured on datanode. The Jira here is to add support for 
> propagation of raft log volume info to SCM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3809) Make number of open containers on a datanode a function of no of volumes reported by it

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3809:
-

Assignee: Shashikant Banerjee

> Make number of open containers on a datanode a function of no of volumes 
> reported by it
> ---
>
> Key: HDDS-3809
> URL: https://issues.apache.org/jira/browse/HDDS-3809
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> The no of open containers on a datanode is to be driven by factor of no of 
> data disks available multiplied by no of open containers per disk. The aim 
> here is to add the logic here and verify it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3136) retry timeout is large while writing key

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3136.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> retry timeout is large while writing key
> 
>
> Key: HDDS-3136
> URL: https://issues.apache.org/jira/browse/HDDS-3136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Priority: Major
>  Labels: TriagePending, fault_injection
> Fix For: 0.6.0
>
>
> Steps :
>  # Mounted noise injection FUSE on all datanodes.
>  # Injected WRITE delay of 5 seconds on one of the datanodes from each open 
> pipeline
>  # Write a key of 180 MB
> Write operation took more than 10 minutes to complete



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3350) Ozone Retry Policy Improvements

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3350.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Ozone Retry Policy Improvements
> ---
>
> Key: HDDS-3350
> URL: https://issues.apache.org/jira/browse/HDDS-3350
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Blocker
>  Labels: Triaged, pull-request-available
> Fix For: 0.6.0
>
> Attachments: Retry Behaviour in Ozone Client.pdf, Retry Behaviour in 
> Ozone Client_Updated.pdf, Retry Behaviour in Ozone Client_Updated_2.pdf, 
> Retry Policy Results - Teragen 100GB.pdf
>
>
> Currently any ozone client request can spend a huge amount of time in retries 
> and ozone client can retry its requests very aggressively. The waiting time 
> can thus be very high before a client request fails. Further aggressive 
> retries by ratis client used by ozone can bog down a ratis pipeline leader. 
> The Jira aims to make changes to the current retry behavior in Ozone client. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3810) Add the logic to distribute open containers among the piplelines of a datanode

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3810:
--
Summary: Add the logic to distribute open containers among the piplelines 
of a datanode  (was: Add the logic to distribute open containers among the 
pipleines of a datanode)

> Add the logic to distribute open containers among the piplelines of a datanode
> --
>
> Key: HDDS-3810
> URL: https://issues.apache.org/jira/browse/HDDS-3810
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Priority: Major
>
> A datanode can participate in multiple pipelines based on no of raft log 
> disks as well the disk type. SCM should make the distribution of open 
> containers among these set of pipelines evenly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3811) Add tests to verify all the disks of a datanode are utilized for write

2020-06-15 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-3811:
-

 Summary: Add tests to verify all the disks of a datanode are 
utilized for write 
 Key: HDDS-3811
 URL: https://issues.apache.org/jira/browse/HDDS-3811
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Shashikant Banerjee






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3810) Add the logic to distribute open containers among the pipleines of a datanode

2020-06-15 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-3810:
-

 Summary: Add the logic to distribute open containers among the 
pipleines of a datanode
 Key: HDDS-3810
 URL: https://issues.apache.org/jira/browse/HDDS-3810
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Shashikant Banerjee


A datanode can participate in multiple pipelines based on no of raft log disks 
as well the disk type. SCM should make the distribution of open containers 
among these set of pipelines evenly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3809) Make number of open containers on a datanode a function of no of volumes reported by it

2020-06-15 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-3809:
-

 Summary: Make number of open containers on a datanode a function 
of no of volumes reported by it
 Key: HDDS-3809
 URL: https://issues.apache.org/jira/browse/HDDS-3809
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Shashikant Banerjee


The no of open containers on a datanode is to be driven by factor of no of data 
disks available multiplied by no of open containers per disk. The aim here is 
to add the logic here and verify it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3808) Ensure volume info on a datanode is propagate to SCM

2020-06-15 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-3808:
-

 Summary: Ensure volume info on a datanode is propagate to SCM
 Key: HDDS-3808
 URL: https://issues.apache.org/jira/browse/HDDS-3808
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Shashikant Banerjee
 Fix For: 0.6.0


The aim here is to verify if volume level info of datanode is propaged to 
datanode and if not, add the support here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3807) Propagate raft log diks info to SCM from datanode

2020-06-15 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-3807:
-

 Summary: Propagate raft log diks info to SCM from datanode
 Key: HDDS-3807
 URL: https://issues.apache.org/jira/browse/HDDS-3807
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: Ozone Datanode, SCM
Reporter: Shashikant Banerjee


No of pipelines to be created on a datnode is to be driven by the no of raft 
log disks configured on datanode. The Jira here is to add support for 
propagation of raft log volume info to SCM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3807) Propagate raft log disks info to SCM from datanode

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3807:
--
Description: No of pipelines to be created on a datanode is to be driven by 
the no of raft log disks configured on datanode. The Jira here is to add 
support for propagation of raft log volume info to SCM.  (was: No of pipelines 
to be created on a datnode is to be driven by the no of raft log disks 
configured on datanode. The Jira here is to add support for propagation of raft 
log volume info to SCM.)
Summary: Propagate raft log disks info to SCM from datanode  (was: 
Propagate raft log diks info to SCM from datanode)

> Propagate raft log disks info to SCM from datanode
> --
>
> Key: HDDS-3807
> URL: https://issues.apache.org/jira/browse/HDDS-3807
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Shashikant Banerjee
>Priority: Major
>
> No of pipelines to be created on a datanode is to be driven by the no of raft 
> log disks configured on datanode. The Jira here is to add support for 
> propagation of raft log volume info to SCM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3700) Number of open containers per pipeline should be tuned as per the number of disks on datanode

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3700:
--
Attachment: Load Distribution Across disks in Ozone.pdf

> Number of open containers per pipeline should be tuned as per the number of 
> disks on datanode
> -
>
> Key: HDDS-3700
> URL: https://issues.apache.org/jira/browse/HDDS-3700
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.7.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: Performance
> Attachments: Load Distribution Across disks in Ozone.pdf, Screenshot 
> 2020-06-02 at 12.44.14 PM.png
>
>
> Currently, "ozone.scm.pipeline.owner.container.count" is configured by 
> default to 3. The default should ideally be a function of the no of disks on 
> a datanode. A static value may lead to uneven utilisation during active IO .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3700) Number of open containers per pipeline should be tuned as per the number of disks on datanode

2020-06-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3700:
-

Assignee: Shashikant Banerjee

> Number of open containers per pipeline should be tuned as per the number of 
> disks on datanode
> -
>
> Key: HDDS-3700
> URL: https://issues.apache.org/jira/browse/HDDS-3700
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.7.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: Performance
> Attachments: Screenshot 2020-06-02 at 12.44.14 PM.png
>
>
> Currently, "ozone.scm.pipeline.owner.container.count" is configured by 
> default to 3. The default should ideally be a function of the no of disks on 
> a datanode. A static value may lead to uneven utilisation during active IO .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3789) Fix TestOzoneRpcClientAbstract#testDeletedKeyForGDPR

2020-06-12 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3789:
--
Parent: HDDS-2964
Issue Type: Sub-task  (was: Bug)

> Fix TestOzoneRpcClientAbstract#testDeletedKeyForGDPR
> 
>
> Key: HDDS-3789
> URL: https://issues.apache.org/jira/browse/HDDS-3789
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Priority: Major
>
> {code:java}
> [ERROR] Tests run: 67, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 
> 36.615 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
> 3053[ERROR] 
> testDeletedKeyForGDPR(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.165 s  <<< ERROR!
> 3054java.lang.NullPointerException
> 3055  at 
> org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientAbstract.testDeletedKeyForGDPR(TestOzoneRpcClientAbstract.java:2730)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3789) Fix TestOzoneRpcClientAbstract#testDeletedKeyForGDPR

2020-06-12 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-3789:
-

 Summary: Fix TestOzoneRpcClientAbstract#testDeletedKeyForGDPR
 Key: HDDS-3789
 URL: https://issues.apache.org/jira/browse/HDDS-3789
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Shashikant Banerjee


{code:java}
[ERROR] Tests run: 67, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 36.615 
s <<< FAILURE! - in 
org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
3053[ERROR] 
testDeletedKeyForGDPR(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
  Time elapsed: 0.165 s  <<< ERROR!
3054java.lang.NullPointerException
3055at 
org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientAbstract.testDeletedKeyForGDPR(TestOzoneRpcClientAbstract.java:2730)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2384) Large chunks during write can have memory pressure on DN with multiple clients

2020-06-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2384:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Large chunks during write can have memory pressure on DN with multiple clients
> --
>
> Key: HDDS-2384
> URL: https://issues.apache.org/jira/browse/HDDS-2384
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Rajesh Balamohan
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: Triaged, performance
>
> During large file writes, it ends up writing {{16 MB}} chunks.  
> https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L691
> In large clusters, 100s of clients may connect to DN. In such cases, 
> depending on the incoming write workload mem load on DN can increase 
> significantly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3262) Fix TestOzoneRpcClientWithRatis.java

2020-06-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3262:
-

Assignee: Shashikant Banerjee

> Fix TestOzoneRpcClientWithRatis.java
> 
>
> Key: HDDS-3262
> URL: https://issues.apache.org/jira/browse/HDDS-3262
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3778) Block distribution in a pipeline among open containers is not uniform

2020-06-11 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133110#comment-17133110
 ] 

Shashikant Banerjee commented on HDDS-3778:
---

The solution is to keep block allocation synchronised on the pipeline but it 
leads to perf degradation by 50% as indicated by Genesis benchmark for allocate 
block. 

Thanks [~nanda] for the benchmark results.

> Block distribution in a pipeline among open containers is not uniform
> -
>
> Key: HDDS-3778
> URL: https://issues.apache.org/jira/browse/HDDS-3778
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.7.0
>
> Attachments: With-fully-synchronized-getMatchingContainer.png, 
> Without-fully-synchronized-getMatchingContainer.png
>
>
> Currently, with concurrent allocate block calls, the block allocation among 
> the open containers of a pipeline is not uniform as with concurrent logic, 
> block allocation logic with last used notion does not hold up. The idea here 
> is to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3778) Block distribution in a pipeline among open containers is not uniform

2020-06-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3778:
--
Attachment: Without-fully-synchronized-getMatchingContainer.png
With-fully-synchronized-getMatchingContainer.png

> Block distribution in a pipeline among open containers is not uniform
> -
>
> Key: HDDS-3778
> URL: https://issues.apache.org/jira/browse/HDDS-3778
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.7.0
>
> Attachments: With-fully-synchronized-getMatchingContainer.png, 
> Without-fully-synchronized-getMatchingContainer.png
>
>
> Currently, with concurrent allocate block calls, the block allocation among 
> the open containers of a pipeline is not uniform as with concurrent logic, 
> block allocation logic with last used notion does not hold up. The idea here 
> is to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3778) Block distribution in a pipeline among open containers is not uniform

2020-06-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3778:
-

Assignee: Shashikant Banerjee  (was: Nanda kumar)

> Block distribution in a pipeline among open containers is not uniform
> -
>
> Key: HDDS-3778
> URL: https://issues.apache.org/jira/browse/HDDS-3778
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.7.0
>
>
> Currently, with concurrent allocate block calls, the block allocation among 
> the open containers of a pipeline is not uniform as with concurrent logic, 
> block allocation logic with last used notion does not hold up. The idea here 
> is to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3778) Block distribution in a pipeline among open containers is not uniform

2020-06-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-3778:
-

Assignee: Nanda kumar  (was: Shashikant Banerjee)

> Block distribution in a pipeline among open containers is not uniform
> -
>
> Key: HDDS-3778
> URL: https://issues.apache.org/jira/browse/HDDS-3778
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Shashikant Banerjee
>Assignee: Nanda kumar
>Priority: Blocker
> Fix For: 0.7.0
>
>
> Currently, with concurrent allocate block calls, the block allocation among 
> the open containers of a pipeline is not uniform as with concurrent logic, 
> block allocation logic with last used notion does not hold up. The idea here 
> is to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3778) Block distribution in a pipeline among open containers is not uniform

2020-06-10 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-3778:
-

 Summary: Block distribution in a pipeline among open containers is 
not uniform
 Key: HDDS-3778
 URL: https://issues.apache.org/jira/browse/HDDS-3778
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.7.0


Currently, with concurrent allocate block calls, the block allocation among the 
open containers of a pipeline is not uniform as with concurrent logic, block 
allocation logic with last used notion does not hold up. The idea here is to 
address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2887) Add config to tune replication level of watch requests in Ozone client

2020-06-10 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2887:
--
Labels: Triaged  (was: )

> Add config to tune replication level of watch requests in Ozone client
> --
>
> Key: HDDS-2887
> URL: https://issues.apache.org/jira/browse/HDDS-2887
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: Triaged
>
> Currently, while sending watch requests in ozone client, it sends watch 
> requests with ratis replication level set to ALL_COMMITTED and in case it 
> fails, it sends the request  with MAJORITY_COMMITTED semantics. The idea is 
> to configure the replication level for watch requests so as to measure 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-770) ozonefs client warning exception logs should not be displayed on console

2020-06-10 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-770:
-
Target Version/s: 0.7.0  (was: 0.6.0)

> ozonefs client warning exception logs should not be displayed on console
> 
>
> Key: HDDS-770
> URL: https://issues.apache.org/jira/browse/HDDS-770
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Major
>  Labels: Triaged
>
> steps taken :
> -
>  # ran ozonefs cp command  - "ozone fs -cp /testdir2/2GB /testdir2/2GB_111"
>  # command execution was successful and file was successfully copied.
> But , the warning logs/exceptions are displayed on console :
>  
> {noformat}
> [root@ctr-e138-1518143905142-53-01-03 ~]# ozone fs -cp /testdir2/2GB 
> /testdir2/2GB_111
> 2018-10-31 09:12:35,052 WARN scm.XceiverClientGrpc: Failed to execute command 
> cmdType: GetBlock
> traceID: "b73d7d2d-232a-40d7-b0b6-478e3d40ed6a"
> containerID: 17
> datanodeUuid: "ce0084c2-97cd-4c97-9378-e5175daad18b"
> getBlock {
>  blockID {
>  containerID: 17
>  localID: 100989077200109583
>  }
>  blockCommitSequenceId: 60
> }
>  on datanode 9fab9937-fbcd-4196-8014-cb165045724b
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:167)
>  at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:146)
>  at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:105)
>  at 
> org.apache.hadoop.ozone.client.io.ChunkGroupInputStream.getFromOmKeyInfo(ChunkGroupInputStream.java:301)
>  at org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:493)
>  at org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:272)
>  at org.apache.hadoop.fs.ozone.OzoneFileSystem.open(OzoneFileSystem.java:178)
>  at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>  at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:341)
>  at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
>  at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
>  at org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367)
>  at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
>  at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304)
>  at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:257)
>  at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286)
>  at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270)
>  at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:228)
>  at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:120)
>  at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
>  at org.apache.hadoop.fs.FsShell.run(FsShell.java:327)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>  at org.apache.hadoop.fs.FsShell.main(FsShell.java:390)
> Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
>  at 
> org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:526)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:420)
>  at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
>  at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
>  at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:684)
>  at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
>  at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
>  at 
> org.apache.ratis.thirdparty.io.grpc.Forwar

[jira] [Updated] (HDDS-1079) java.lang.RuntimeException: ManagedChannel allocation site exception seen on client cli when datanode restarted in one of the pipelines

2020-06-10 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1079:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> java.lang.RuntimeException: ManagedChannel allocation site exception seen on 
> client cli when datanode restarted in one of the pipelines
> ---
>
> Key: HDDS-1079
> URL: https://issues.apache.org/jira/browse/HDDS-1079
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Priority: Major
>  Labels: TriagePending
> Attachments: nodes-ozone-logs-1549879783.tar.gz
>
>
> steps taken :
> 
>  # created 12 datanode cluster.
>  # started put key operation with size 100GB.
>  # Restarted one of the datanodes from one of the pipelines.
> exception seen  on cli :
> 
>  
> {noformat}
> [root@ctr-e139-1542663976389-62237-01-06 ~]# time ozone sh key put 
> volume1/bucket1/key1 /root/100G
> Feb 11, 2019 9:12:49 AM 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=61, 
> target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~*
>  Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() 
> returns true.
> java.lang.RuntimeException: ManagedChannel allocation site
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.(GrpcClientProtocolClient.java:116)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.lambda$new$0(GrpcClientRpc.java:54)
>  at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$getProxy$0(PeerProxyMap.java:60)
>  at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:191)
>  at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.getProxy(PeerProxyMap.java:59)
>  at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:106)
>  at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsync(GrpcClientRpc.java:69)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:324)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286)
>  at 
> org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243)
>  at org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259)
>  at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
>  at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>  at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Feb 11, 2019 9:12:49 AM 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=29, 
> target=172.27.10.133:9858} was not shutdown properly!!! ~*~*~*
>  Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() 
> returns true.
> java.lang.RuntimeException: ManagedChannel allocation site
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(Ma

[jira] [Updated] (HDDS-1854) Print intuitive error message at client when the pipeline returned by SCM has no datanode

2020-06-10 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1854:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Print intuitive error message at client when the pipeline returned by SCM has 
> no datanode
> -
>
> Key: HDDS-1854
> URL: https://issues.apache.org/jira/browse/HDDS-1854
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Nanda kumar
>Priority: Major
>  Labels: Triaged
>
> We are throwing {{IllegalArgumentException}} in OzoneClient when the pipeline 
> returned by SCM doesn't have any datanode information. Instead of throwing 
> {{IllegalArgumentException}}, we can throw custom user friendly exception 
> which is easy to understand.
> Existing exception trace:
> {noformat}
> AssertionError: Ozone get Key failed with 
> output=[java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:150)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:143)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:222)
>   at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
>   at 
> org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
>   at java.base/java.io.InputStream.read(InputStream.java:205)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94)
>   at 
> org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:98)
>   at 
> org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48)
>   at picocli.CommandLine.execute(CommandLine.java:1173)
>   at picocli.CommandLine.access$800(CommandLine.java:141)
>   at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
>   at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
>   at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
>   at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
>   at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
>   at 
> org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
>   at 
> org.apache.hadoop.ozone.web.ozShell.OzoneShell.execute(OzoneShell.java:60)
>   at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
>   at 
> org.apache.hadoop.ozone.web.ozShell.OzoneShell.main(OzoneShell.java:53)]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2284) XceiverClientMetrics should be initialised as part of XceiverClientManager constructor

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2284:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> XceiverClientMetrics should be initialised as part of XceiverClientManager 
> constructor
> --
>
> Key: HDDS-2284
> URL: https://issues.apache.org/jira/browse/HDDS-2284
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: TriagePending
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> XceiverClientMetrics is currently initialized in the read write path, the 
> metric should be initialized while creating XceiverClientManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1206) Handle Datanode volume out of space

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1206:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Handle Datanode volume out of space
> ---
>
> Key: HDDS-1206
> URL: https://issues.apache.org/jira/browse/HDDS-1206
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: Triaged
>
> steps taken :
> 
>  # create 40 datanode cluster.
>  # one of the datanodes has less than 5 GB space.
>  # Started writing key of size 600MB.
> operation failed:
> Error on the client:
> 
> {noformat}
> Fri Mar 1 09:05:28 UTC 2019 Ruuning 
> /root/hadoop_trunk/ozone-0.4.0-SNAPSHOT/bin/ozone sh key put 
> testvol172275910-1551431122-1/testbuck172275910-1551431122-1/test_file24 
> /root/test_files/test_file24
> original md5sum a6de00c9284708585f5a99b0490b0b23
> 2019-03-01 09:05:39,142 ERROR storage.BlockOutputStream: Unexpected Storage 
> Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 79 creation failed
>  at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:39,578 ERROR storage.BlockOutputStream: Unexpected Storage 
> Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 79 creation failed
>  at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,368 ERROR storage.BlockOutputStream: Unexpected Storage 
> Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 79 creation failed
>  at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-03-01 09:05:40,450 ERROR storage.BlockOutputStream: Unexpected Storage 
> Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 79 creation failed
>  at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:568)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:535)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$5(BlockOutputStream.java:613)
>  at 
> java.util.concurrent.CompletableFutu

[jira] [Updated] (HDDS-1286) Add more unit tests to validate exception path during write for Ozone client

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1286:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Add more unit tests to validate exception path during write for Ozone client
> 
>
> Key: HDDS-1286
> URL: https://issues.apache.org/jira/browse/HDDS-1286
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: Triaged
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-800) Avoid ByteString to byte array conversion cost by using ByteBuffer

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-800:
-
Target Version/s: 0.7.0  (was: 0.6.0)

> Avoid ByteString to byte array conversion cost by using ByteBuffer
> --
>
> Key: HDDS-800
> URL: https://issues.apache.org/jira/browse/HDDS-800
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: TriagePending, performance
>
> As noticed in HDDS-799, protobuf bytestring to byte[] array conversion has 
> significant performance overhead in the read and write path, This jira 
> proposes to use ByteBuffer in place to byte buffer to negate the performance 
> overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-371) Add RetriableException class in Ozone

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-371.
--
Fix Version/s: 0.6.0
   Resolution: Won't Do

> Add RetriableException class in Ozone
> -
>
> Key: HDDS-371
> URL: https://issues.apache.org/jira/browse/HDDS-371
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: TriagePending
> Fix For: 0.6.0
>
>
> Certain Exception thrown by a server can be because server is in a state
> where request cannot be processed temporarily.
>  Ozone Client may retry the request. If the service is up, the server may be 
> able to
>  process a retried request. This Jira aims to introduce notion of 
> RetriableException in Ozone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-986) Stack overflow in TestFailureHandlingByClient

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-986.
--
Fix Version/s: 0.6.0
   Resolution: Cannot Reproduce

> Stack overflow in TestFailureHandlingByClient
> -
>
> Key: HDDS-986
> URL: https://issues.apache.org/jira/browse/HDDS-986
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: TriagePending
> Fix For: 0.6.0
>
>
> Stackoverflow observed with 
> TestFailureHandlingByClient#testMultiBlockWritesWithIntermittentDnFailures
> https://builds.apache.org/job/PreCommit-HDDS-Build/2063/testReport/org.apache.hadoop.ozone.client.rpc/TestFailureHandlingByClient/testMultiBlockWritesWithIntermittentDnFailures/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1438) TestOzoneClientRetriesOnException#testGroupMismatchExceptionHandling is failing because of allocate block failures

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1438.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> TestOzoneClientRetriesOnException#testGroupMismatchExceptionHandling is 
> failing because of allocate block failures
> --
>
> Key: HDDS-1438
> URL: https://issues.apache.org/jira/browse/HDDS-1438
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: TriagePending
> Fix For: 0.6.0
>
>
> The test is failing with the allocate block failure assertion.
> https://ci.anzix.net/job/ozone-nightly/61/testReport/org.apache.hadoop.ozone.client.rpc/TestOzoneClientRetriesOnException/testGroupMismatchExceptionHandling/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1325) Exception thrown while initializing ozoneClientAdapter

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1325:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Exception thrown while initializing ozoneClientAdapter 
> ---
>
> Key: HDDS-1325
> URL: https://issues.apache.org/jira/browse/HDDS-1325
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Priority: Major
>  Labels: TriagePending
>
> ozone version :
> 
>  
> {noformat}
> Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r 
> 568d3ab8b65d1348dec9c971feffe200e6cba2ef
> Compiled by nnandi on 2019-03-19T03:54Z
> Compiled with protoc 2.5.0
> From source with checksum c44d339e20094d3054754078afbf4c
> Using HDDS 0.5.0-SNAPSHOT
> Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r 
> 568d3ab8b65d1348dec9c971feffe200e6cba2ef
> Compiled by nnandi on 2019-03-19T03:53Z
> Compiled with protoc 2.5.0
> From source with checksum b354934fb1352f4d5425114bf8dce11
> {noformat}
>  
>  
> steps taken :
> ---
>  # Add ozone libs in hadoop classpath.
>  # Tried to run s3dupdo workload ([https://github.com/t3rmin4t0r/s3dupdo])
> Here is the exception thrown :
>  
> {noformat}
> java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.lambda$createAdapter$1(OzoneClientAdapterFactory.java:65)
>  at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:105)
>  at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:61)
>  at 
> org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(OzoneFileSystem.java:167)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3326)
>  at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:532)
>  at org.notmysock.repl.Works$CopyWorker.run(Works.java:243)
>  at org.notmysock.repl.Works$CopyWorker.call(Works.java:279)
>  at org.notmysock.repl.Works$CopyWorker.call(Works.java:204)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.LinkageError: loader constraint violation: loader 
> (instance of org/apache/hadoop/fs/ozone/FilteredClassLoader) previously 
> initiated loading for a different type with name 
> "org/apache/hadoop/security/token/Token"
>  at java.lang.ClassLoader.defineClass1(Native Method)
>  at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
>  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>  at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
>  at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>  at 
> org.apache.hadoop.fs.ozone.FilteredClassLoader.loadClass(FilteredClassLoader.java:71)
>  at java.lang.Class.getDeclaredMethods0(Native Method)
>  at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>  at java.lang.Class.privateGetPublicMethods(Class.java:2902)
>  at java.lang.Class.getMethods(Class.java:1615)
>  at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:451)
>  at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:339)
>  at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:639)
>  at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:557)
>  at java.lang.reflect.WeakCache$Factory.get(WeakCache.java:230)
>  at java.lang.reflect.WeakCache.get(WeakCache.java:127)
>  at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:419)
>  at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:719)
>  a

[jira] [Updated] (HDDS-1446) Grpc channels are leaked in XceiverClientGrpc

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1446:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Grpc channels are leaked in XceiverClientGrpc
> -
>
> Key: HDDS-1446
> URL: https://issues.apache.org/jira/browse/HDDS-1446
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: Triaged
>
> Grpc Channels are leaked in MiniOzoneChaosCluster runs.
> {code}
> SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=522, 
> target=10.200.4.160:52415} was not shutdown properly!!! ~*~*~*
> Make sure to call shutdown()/shutdownNow() and wait until 
> awaitTermination() returns true.
> java.lang.RuntimeException: ManagedChannel allocation site
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:411)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.connectToDatanode(XceiverClientGrpc.java:165)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.reconnect(XceiverClientGrpc.java:389)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandAsync(XceiverClientGrpc.java:340)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:268)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:236)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:210)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:119)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.getFromOmKeyInfo(KeyInputStream.java:302)
> at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.createInputStream(RpcClient.java:993)
> at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:653)
> at 
> org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:325)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:112)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:147)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2145) Optimize client read path by reading multiple chunks along with block info in a single rpc call.

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2145:
--
Labels:   (was: TriagePending)

> Optimize client read path by reading multiple chunks along with block info in 
> a single rpc call.
> 
>
> Key: HDDS-2145
> URL: https://issues.apache.org/jira/browse/HDDS-2145
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Hanisha Koneru
>Priority: Major
>
> Currently, ozone client issues a getBlock call to read the metadata info from 
> rocks Db on dn to get the chunkInfo and then chunk info is read one by one 
> inn separate rpc calls in the read path. This can be optimized by 
> piggybacking readChunk calls along with getBlock in a single rpc call to dn. 
> This Jira aims to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2145) Optimize client read path by reading multiple chunks along with block info in a single rpc call.

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2145:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Optimize client read path by reading multiple chunks along with block info in 
> a single rpc call.
> 
>
> Key: HDDS-2145
> URL: https://issues.apache.org/jira/browse/HDDS-2145
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: TriagePending
>
> Currently, ozone client issues a getBlock call to read the metadata info from 
> rocks Db on dn to get the chunkInfo and then chunk info is read one by one 
> inn separate rpc calls in the read path. This can be optimized by 
> piggybacking readChunk calls along with getBlock in a single rpc call to dn. 
> This Jira aims to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2146) Optimize block write path performance by reducing no of watchForCommit calls

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2146:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Optimize block write path performance by reducing no of watchForCommit calls
> 
>
> Key: HDDS-2146
> URL: https://issues.apache.org/jira/browse/HDDS-2146
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: TriagePending
>
> Currently, the watchForCommit calls from client to Ratis server for All 
> replicated semantics happens when the max buffer limit is reached which can 
> potentially be called 4 times as per the default configs for a single full 
> block write. The idea here is inspect and add optimizations to reduce the no 
> of watchForCommit calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2702) Client failed to recover from ratis AlreadyClosedException exception

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-2702.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Client failed to recover from ratis AlreadyClosedException exception
> 
>
> Key: HDDS-2702
> URL: https://issues.apache.org/jira/browse/HDDS-2702
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Sammi Chen
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: TriagePending
> Fix For: 0.6.0
>
>
> Run teragen, and it failed to enter the mapreduce stage and print the 
> following warnning message on console endlessly. 
> {noformat}
> 19/12/10 19:23:54 WARN io.KeyOutputStream: Encountered exception 
> java.io.IOException: Unexpected Storage Container Exception: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.ratis.protocol.AlreadyClosedException: 
> SlidingWindow$Client:client-FBD45C9313A5->RAFT is closed. on the pipeline 
> Pipeline[ Id: 90deb863-e511-4a5e-ae86-dc8035e8fa7d, Nodes: 
> ed90869c-317e-4303-8922-9fa83a3983cb{ip: 10.120.113.172, host: host172, 
> networkLocation: /rack2, certSerialId: 
> null}1da74a1d-f64d-4ad4-b04c-85f26687e683{ip: 10.121.124.44, host: host044, 
> networkLocation: /rack2, certSerialId: 
> null}515cab4b-39b5-4439-b1a8-a7b725f5784a{ip: 10.120.139.122, host: host122, 
> networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:THREE, 
> State:OPEN, leaderId:515cab4b-39b5-4439-b1a8-a7b725f5784a ]. The last 
> committed block length is 0, uncommitted data length is 295833 retry count 0
> 19/12/10 19:23:54 INFO io.BlockOutputStreamEntryPool: Allocating block with 
> ExcludeList {datanodes = [], containerIds = [], pipelineIds = 
> [PipelineID=90deb863-e511-4a5e-ae86-dc8035e8fa7d]}
> 19/12/10 19:26:16 WARN io.KeyOutputStream: Encountered exception 
> java.io.IOException: Unexpected Storage Container Exception: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.ratis.protocol.AlreadyClosedException: 
> SlidingWindow$Client:client-7C5A7B5CFC31->RAFT is closed. on the pipeline 
> Pipeline[ Id: 90deb863-e511-4a5e-ae86-dc8035e8fa7d, Nodes: 
> ed90869c-317e-4303-8922-9fa83a3983cb{ip: 10.120.113.172, host: host172, 
> networkLocation: /rack2, certSerialId: 
> null}1da74a1d-f64d-4ad4-b04c-85f26687e683{ip: 10.121.124.44, host: host044, 
> networkLocation: /rack2, certSerialId: 
> null}515cab4b-39b5-4439-b1a8-a7b725f5784a{ip: 10.120.139.122, host: host122, 
> networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:THREE, 
> State:OPEN, leaderId:515cab4b-39b5-4439-b1a8-a7b725f5784a ]. The last 
> committed block length is 0, uncommitted data length is 295833 retry count 0
> 19/12/10 19:26:16 INFO io.BlockOutputStreamEntryPool: Allocating block with 
> ExcludeList {datanodes = [], containerIds = [], pipelineIds = 
> [PipelineID=90deb863-e511-4a5e-ae86-dc8035e8fa7d]}
> 19/12/10 19:28:38 WARN io.KeyOutputStream: Encountered exception 
> java.io.IOException: Unexpected Storage Container Exception: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.ratis.protocol.AlreadyClosedException: 
> SlidingWindow$Client:client-B3D8C0052C4E->RAFT is closed. on the pipeline 
> Pipeline[ Id: 90deb863-e511-4a5e-ae86-dc8035e8fa7d, Nodes: 
> ed90869c-317e-4303-8922-9fa83a3983cb{ip: 10.120.113.172, host: host172, 
> networkLocation: /rack2, certSerialId: 
> null}1da74a1d-f64d-4ad4-b04c-85f26687e683{ip: 10.121.124.44, host: host044, 
> networkLocation: /rack2, certSerialId: 
> null}515cab4b-39b5-4439-b1a8-a7b725f5784a{ip: 10.120.139.122, host: host122, 
> networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:THREE, 
> State:OPEN, leaderId:515cab4b-39b5-4439-b1a8-a7b725f5784a ]. The last 
> committed block length is 0, uncommitted data length is 295833 retry count 0
> 19/12/10 19:28:38 INFO io.BlockOutputStreamEntryPool: Allocating block with 
> ExcludeList {datanodes = [], containerIds = [], pipelineIds = 
> [PipelineID=90deb863-e511-4a5e-ae86-dc8035e8fa7d]}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3268) CommitWatcher#watchForCommit does not timeout

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3268.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> CommitWatcher#watchForCommit does not timeout
> -
>
> Key: HDDS-3268
> URL: https://issues.apache.org/jira/browse/HDDS-3268
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: TriagePending
> Fix For: 0.6.0
>
>
> Seems the property *ozone.client.watch.request.timeout* was removed by 
> HDDS-2920.  Note this is a client side property to wait for the future 
> return. Without it, the client may wait for the future return forever in 
> certain cases.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2306) Fix TestWatchForCommit failure

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-2306.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix TestWatchForCommit failure
> --
>
> Key: HDDS-2306
> URL: https://issues.apache.org/jira/browse/HDDS-2306
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.1
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: TriagePending
> Fix For: 0.6.0
>
>
> {code}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 203.385 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
> [ERROR] 
> test2WayCommitForTimeoutException(org.apache.hadoop.ozone.client.rpc.TestWatchForCommit)
>   Time elapsed: 27.093 s  <<< ERROR!
> java.util.concurrent.TimeoutException
>   at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:283)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.test2WayCommitForTimeoutException(TestWatchForCommit.java:391)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-2332.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
> ---
>
> Key: HDDS-2332
> URL: https://issues.apache.org/jira/browse/HDDS-2332
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Lokesh Jain
>Priority: Major
>  Labels: TriagePending
> Fix For: 0.6.0
>
>
> BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that 
> the thread is blocked on the same condition.
> {code:java}
> 2019-10-18 06:30:38
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   - locked <0xa6a75930> (a 
> org.apache.hadoop.fs.FSDataOutputStream)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77)
>   - locked <0xa6a75918> (a 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2019-10-18 07:02:50
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.wai

[jira] [Resolved] (HDDS-2963) Use RequestDependentRetry Policy along with ExceptionDependentRetry Policy in OzoneClient

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-2963.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Use RequestDependentRetry Policy along with ExceptionDependentRetry Policy in 
> OzoneClient
> -
>
> Key: HDDS-2963
> URL: https://issues.apache.org/jira/browse/HDDS-2963
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Bharat Viswanadham
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: TriagePending
> Fix For: 0.6.0
>
>
> This Jira is to use RequestDependentRetry Policy with ExceptionDependentRetry 
> Policy so that for different exceptions and for different kinds of requests 
> we can use different RetryPolicies.
> Dependent on RATIS-799
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2189) Datanode should send PipelineAction on RaftServer failure

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2189:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Datanode should send PipelineAction on RaftServer failure
> -
>
> Key: HDDS-2189
> URL: https://issues.apache.org/jira/browse/HDDS-2189
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: TriagePending
>
> {code:java}
> 2019-09-26 08:03:07,152 ERROR 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> 664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08-SegmentedRaftLogWorker
>  hit exception
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:41)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:72)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:566)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:289)
> at java.lang.Thread.run(Thread.java:748)
> 2019-09-26 08:03:07,155 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 664c4e90-08f3-46c9-a073-c93ef2a55da3@group-93F633896F08: shutdown
> {code}
> On RaftServer shutdown datanode should send a PipelineAction denoting that 
> the pipeline has been closed exceptionally in the datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2342) ContainerStateMachine$chunkExecutor threads hold onto native memory

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2342:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> ContainerStateMachine$chunkExecutor threads hold onto native memory
> ---
>
> Key: HDDS-2342
> URL: https://issues.apache.org/jira/browse/HDDS-2342
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: TriagePending
>
> In a heap dump many threads in ContainerStateMachine$chunkExecutor holds onto 
> native memory in the ThreadLocal map. Every such thread holds onto chunk 
> worth of DirectByteBuffer. Since these threads are involved in write and read 
> chunk operations, the JVM allocates chunk (16MB) worth of DirectByteBuffer in 
> the ThreadLocalMap for every thread involved in IO. Also the native memory 
> would not be GC'ed as long as the thread is alive.
> It would be better to reduce the default number of chunk executor threads and 
> have them in proportion to number of disks on the datanode. We should also 
> use DirectByeBuffers for the IO on datanode. Currently we allocate 
> HeapByteBuffer which needs to be backed by DirectByteBuffer. If we can use a 
> DirectByteBuffer we can avoid a buffer copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2660) Create insight point for datanode container protocol

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2660:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Create insight point for datanode container protocol
> 
>
> Key: HDDS-2660
> URL: https://issues.apache.org/jira/browse/HDDS-2660
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>
> The goal of this task is to create a new insight point for the datanode 
> container protocol ({{HddsDispatcher}}) to be able to debug 
> {{client<->datanode}} communication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3022) Datanode unable to close Pipeline after disk out of space

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3022:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Datanode unable to close Pipeline after disk out of space
> -
>
> Key: HDDS-3022
> URL: https://issues.apache.org/jira/browse/HDDS-3022
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: TriagePending
> Attachments: ozone_logs.zip
>
>
> Datanode gets into a loop and keeps throwing errors while trying to close 
> pipeline
> {code:java}
> 2020-02-14 00:25:10,208 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07: changes role from  
> FOLLOWER to CANDIDATE at term 6240 for changeToCandidate
> 2020-02-14 00:25:10,208 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=02e7e10e-2d50-4ace-a18b-701265ec9f07.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 is in candidate state for 31898494ms
> 2020-02-14 00:25:10,208 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: start LeaderElection
> 2020-02-14 00:25:10,223 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032: 
> begin an election at term 6241 for 0: 
> [d432c890-5ec4-4cf1-9078-28497a08ab85:10.65.6.227:9858, 
> 285cac09-7622-45e6-be02-b3c68ebf8b10:10.65.24.80:9858, 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e:10.65.8.165:9858], old=null
> 2020-02-14 00:25:10,259 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032 
> got exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> d432c890-5ec4-4cf1-9078-28497a08ab85: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032 
> got exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032: 
> Election REJECTED; received 0 response(s) [] and 2 exception(s); 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07:t6241, leader=null, 
> voted=285cac09-7622-45e6-be02-b3c68ebf8b10, 
> raftlog=285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-SegmentedRaftLog:OPENED:c4,f4,i14,
>  conf=0: [d432c890-5ec4-4cf1-9078-28497a08ab85:10.65.6.227:9858, 
> 285cac09-7622-45e6-be02-b3c68ebf8b10:10.65.24.80:9858, 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e:10.65.8.165:9858], old=null
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection:   
> Exception 0: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> d432c890-5ec4-4cf1-9078-28497a08ab85: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection:   
> Exception 1: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07: changes role from 
> CANDIDATE to FOLLOWER at term 6241 for DISCOVERED_A_NEW_TERM
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: shutdown LeaderElection
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: start FollowerState
> 2020-02-14 00:25:10,680 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-DD847EC75388->d432c890-5ec4-4cf1-9078-28497a08ab85-GrpcLogAppender:
>  HEARTBEAT appendEntries Timeout, 
> request=AppendEntriesRequest:cid=12669,entriesCount=0,lastEntry=null
> 2020-02-14 00:25:10,752 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=7ad5ce51-d3fa-4e71-99f2-dd847ec75388.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 has not seen follower/s 
> d432c890-5ec4-4cf1

[jira] [Updated] (HDDS-2476) Share more code between metadata and data scanners

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2476:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Share more code between metadata and data scanners
> --
>
> Key: HDDS-2476
> URL: https://issues.apache.org/jira/browse/HDDS-2476
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: YiSheng Lien
>Priority: Major
>
> There are several duplicated / similar pieces of code in metadata and data 
> scanners.  More code should be reused.
> Examples:
> # ContainerDataScrubberMetrics and ContainerMetadataScrubberMetrics have 3 
> common metrics
> # lifecycle of ContainerMetadataScanner and ContainerDataScanner (main loop, 
> iteration, metrics processing, shutdown)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3163) write Key is hung when write delay is injected in datanode dir

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3163.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> write Key is hung when write delay is injected in datanode dir
> --
>
> Key: HDDS-3163
> URL: https://issues.apache.org/jira/browse/HDDS-3163
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nilotpal Nandi
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: TriagePending, fault_injection
> Fix For: 0.6.0
>
>
> steps taken :
> -
> 1. Mounted noise injection FUSE on all datanodes.
> 2. Select one datanode from each open pipeline
> 3. Inject delay of 120 seconds on chunk file path of selected datanodes
> 4. Start PUT key operation.
> PUT Key operation is stuck and does not return any success/error .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3498) Address already in use Should shutdown the datanode with FATAL log and point out the port and configure key

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3498:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Address already in use Should shutdown the datanode with FATAL log and point 
> out the port and configure key
> ---
>
> Key: HDDS-3498
> URL: https://issues.apache.org/jira/browse/HDDS-3498
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.6.0
>Reporter: maobaolong
>Priority: Minor
>  Labels: Triaged
>
> Now, the datanode process cannot work because the port is in use, but the 
> process still live.
> Further more, i guess the in use port is 9861, but it isn't, after look the 
> source code, i find it is the `dfs.container.ipc`, default port is 9859, this 
> port should appear with the following exception. I think this error should be 
> in FATAL level, and we can terminate the datanode process.
> {code:java}
> 2020-04-21 15:53:05,436 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.EndpointStateMachine: 
> Unable to communicate to SCM server at 127.0.0.1:9861 for past 300 seconds.
> java.io.IOException: Failed to bind
> at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:246)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:184)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:90)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.XceiverServerGrpc.start(XceiverServerGrpc.java:141)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:235)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:113)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:433)
> at sun.nio.ch.Net.bind(Net.java:425)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:132)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:551)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1345)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:503)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:488)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:984)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:247)
> at 
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:355)
> at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515)
> at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at 
> org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> ... 1 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsu

[jira] [Updated] (HDDS-2696) Document recovery from RATIS-677

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2696:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Document recovery from RATIS-677
> 
>
> Key: HDDS-2696
> URL: https://issues.apache.org/jira/browse/HDDS-2696
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Istvan Fajth
>Priority: Critical
>  Labels: Triaged
>
> As RATIS-677 is solved in a way where a setting needs to be changed, and set 
> for the RatisServer implementation to ignore the corruption, and at the 
> moment due to HDDS-2647, we do not have a clear recovery path from a ratis 
> corruption in the pipeline data.
> We should document how this can be recovered. I have an idea which includes 
> closing the pipeline in SCM and remove the ratis metadata for the pipeline in 
> the DataNodes, which effectively clears out the corrupted pipeline from the 
> system.
> There are two problems I have with finding a recovery path, and document it:
> - I am not sure if we have strong enough guarantees that the writes happened 
> properly if the ratis metadata could become corrupt so this needs to be 
> investigated.
> - At the moment I can not validate this approach, as if I do the steps (stop 
> the 3 DN, move out ratis data for pipeline, close the pipeline with scmcli, 
> then restart the DNs) the pipeline is not closed properly, and SCM fails as 
> described in HDDS-2695



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3514) Fix Memory leak of RaftServerImpl

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3514.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix Memory leak of RaftServerImpl
> -
>
> Key: HDDS-3514
> URL: https://issues.apache.org/jira/browse/HDDS-3514
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>  Labels: Triaged, pull-request-available
> Fix For: 0.6.0
>
>
> This depends on [RATIS-845|https://issues.apache.org/jira/browse/RATIS-845], 
> find the details in RATIS-845.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2701) Avoid read from temporary chunk file in datanode

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2701:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Avoid read from temporary chunk file in datanode
> 
>
> Key: HDDS-2701
> URL: https://issues.apache.org/jira/browse/HDDS-2701
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: TriagePending
>
> Currently we try reading chunk data from the temp file if chunk file does not 
> exist. The fix was added in HDDS-2372 due to race condition between 
> readStateMachineData and writeStateMachineData in ContainerStateMachine. 
> After HDDS-2542 is fixed the read from the temp file can be avoided by making 
> sure that chunk data remains in cache until the chunk file is generated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



  1   2   3   4   >