[jira] [Updated] (HDFS-14915) Move Superuser Check Before Taking Lock For Encryption API

2019-10-20 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-14915:

Attachment: HDFS-14915-01.patch

> Move Superuser Check Before Taking Lock For Encryption API
> --
>
> Key: HDFS-14915
> URL: https://issues.apache.org/jira/browse/HDFS-14915
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14915-01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14915) Move Superuser Check Before Taking Lock For Encryption API

2019-10-20 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-14915:

Status: Patch Available  (was: Open)

> Move Superuser Check Before Taking Lock For Encryption API
> --
>
> Key: HDFS-14915
> URL: https://issues.apache.org/jira/browse/HDFS-14915
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14915-01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14915) Move Superuser Check Before Taking Lock For Encryption API

2019-10-20 Thread Ayush Saxena (Jira)
Ayush Saxena created HDFS-14915:
---

 Summary: Move Superuser Check Before Taking Lock For Encryption API
 Key: HDFS-14915
 URL: https://issues.apache.org/jira/browse/HDFS-14915
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ayush Saxena
Assignee: Ayush Saxena






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future

2019-10-20 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955757#comment-16955757
 ] 

Lokesh Jain commented on HDDS-2332:
---

[~cxorm] It is difficult to reproduce the issue. I saw it in one of the runs. 
It is happening because of RATIS-718. Once it is fixed it should not appear in 
the runs. But we might need to support request timeouts in ozone as well.

> BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
> ---
>
> Key: HDDS-2332
> URL: https://issues.apache.org/jira/browse/HDDS-2332
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Lokesh Jain
>Priority: Major
>
> BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that 
> the thread is blocked on the same condition.
> {code:java}
> 2019-10-18 06:30:38
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   - locked <0xa6a75930> (a 
> org.apache.hadoop.fs.FSDataOutputStream)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77)
>   - locked <0xa6a75918> (a 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2019-10-18 07:02:50
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>  

[jira] [Commented] (HDDS-2328) Support large-scale listing

2019-10-20 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955756#comment-16955756
 ] 

Lokesh Jain commented on HDDS-2328:
---

Currently we do not implement FileSystem#listLocatedStatus api in Ozone. 
Therefore it ends up calling listStatus for the entire directory at once which 
can lead to OOM. I think we just need to have an implementation for 
listLocatedStatus and other such related apis in BasicOzoneFileSystem.

> Support large-scale listing 
> 
>
> Key: HDDS-2328
> URL: https://issues.apache.org/jira/browse/HDDS-2328
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: performance
>
> Large-scale listing of directory contents takes a lot longer time and also 
> has the potential to run into OOM. I have > 1 million entries in the same 
> level and it took lot longer time with {{RemoteIterator}} (didn't complete as 
> it was stuck in RDB::seek).
> S3A batches it with 5K listing per fetch IIRC.  It would be good to have this 
> feature in ozone as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2206) Separate handling for OMException and IOException in the Ozone Manager

2019-10-20 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-2206:
-
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Separate handling for OMException and IOException in the Ozone Manager
> --
>
> Key: HDDS-2206
> URL: https://issues.apache.org/jira/browse/HDDS-2206
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> As part of improving error propagation from the OM for ease of 
> troubleshooting and diagnosis, the proposal is to handle IOExceptions 
> separately from the business exceptions which are thrown as OMExceptions.
> Handling for OMExceptions will not be changed in this jira.
> Handling for IOExceptions will include logging the stacktrace on the server, 
> and propagation to the client under the control of a config parameter.
> Similar handling is also proposed for SCMException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2206) Separate handling for OMException and IOException in the Ozone Manager

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2206?focusedWorklogId=331192=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331192
 ]

ASF GitHub Bot logged work on HDDS-2206:


Author: ASF GitHub Bot
Created on: 21/Oct/19 05:00
Start Date: 21/Oct/19 05:00
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #12: 
HDDS-2206. Separate handling for OMException and IOException in the Ozone 
Manager. Contributed by Supratim Deka
URL: https://github.com/apache/hadoop-ozone/pull/12
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331192)
Time Spent: 2h 10m  (was: 2h)

> Separate handling for OMException and IOException in the Ozone Manager
> --
>
> Key: HDDS-2206
> URL: https://issues.apache.org/jira/browse/HDDS-2206
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> As part of improving error propagation from the OM for ease of 
> troubleshooting and diagnosis, the proposal is to handle IOExceptions 
> separately from the business exceptions which are thrown as OMExceptions.
> Handling for OMExceptions will not be changed in this jira.
> Handling for IOExceptions will include logging the stacktrace on the server, 
> and propagation to the client under the control of a config parameter.
> Similar handling is also proposed for SCMException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2326) Http server of Freon is not started for new Freon tests

2019-10-20 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-2326.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> Http server of Freon is not started for new Freon tests
> ---
>
> Key: HDDS-2326
> URL: https://issues.apache.org/jira/browse/HDDS-2326
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: freon
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HDDS-2022 introduced new Freon tests but the Freon http server is not started 
> for the new tests.
> Freon includes a http server which can be turned on with the '–server' flag. 
> It helps to monitor and profile the freon as the http server contains by 
> default the prometheus and profiler servlets.
> The server should be started if's requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2326) Http server of Freon is not started for new Freon tests

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2326?focusedWorklogId=331190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331190
 ]

ASF GitHub Bot logged work on HDDS-2326:


Author: ASF GitHub Bot
Created on: 21/Oct/19 04:46
Start Date: 21/Oct/19 04:46
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #52: 
HDDS-2326. Http server of Freon is not started for new Freon tests
URL: https://github.com/apache/hadoop-ozone/pull/52
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331190)
Time Spent: 20m  (was: 10m)

> Http server of Freon is not started for new Freon tests
> ---
>
> Key: HDDS-2326
> URL: https://issues.apache.org/jira/browse/HDDS-2326
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: freon
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HDDS-2022 introduced new Freon tests but the Freon http server is not started 
> for the new tests.
> Freon includes a http server which can be turned on with the '–server' flag. 
> It helps to monitor and profile the freon as the http server contains by 
> default the prometheus and profiler servlets.
> The server should be started if's requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2335) Params not included in AuditMessage

2019-10-20 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-2335:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks for the contribution [~adoroszlai] and [~bharat] and [~dineshchitlangia] 
for the reviews. I have committed this.

> Params not included in AuditMessage
> ---
>
> Key: HDDS-2335
> URL: https://issues.apache.org/jira/browse/HDDS-2335
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HDDS-2323 introduced the following Findbugs violation:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt}
> M P UrF: Unread field: 
> org.apache.hadoop.ozone.audit.AuditMessage$Builder.params  At 
> AuditMessage.java:[line 106]
> {noformat}
> Which reveals that {{params}} is now not logged in audit messages:
> {noformat}
> 2019-10-20 08:41:35,248 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_VOLUME | ret=SUCCESS |
> 2019-10-20 08:41:35,312 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_BUCKET | ret=SUCCESS |
> 2019-10-20 08:41:35,407 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=ALLOCATE_KEY | ret=SUCCESS |
> 2019-10-20 08:41:37,355 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=COMMIT_KEY | ret=SUCCESS |
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2335) Params not included in AuditMessage

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2335?focusedWorklogId=331187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331187
 ]

ASF GitHub Bot logged work on HDDS-2335:


Author: ASF GitHub Bot
Created on: 21/Oct/19 04:19
Start Date: 21/Oct/19 04:19
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #62: HDDS-2335. 
Params not included in AuditMessage
URL: https://github.com/apache/hadoop-ozone/pull/62
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331187)
Time Spent: 20m  (was: 10m)

> Params not included in AuditMessage
> ---
>
> Key: HDDS-2335
> URL: https://issues.apache.org/jira/browse/HDDS-2335
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HDDS-2323 introduced the following Findbugs violation:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt}
> M P UrF: Unread field: 
> org.apache.hadoop.ozone.audit.AuditMessage$Builder.params  At 
> AuditMessage.java:[line 106]
> {noformat}
> Which reveals that {{params}} is now not logged in audit messages:
> {noformat}
> 2019-10-20 08:41:35,248 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_VOLUME | ret=SUCCESS |
> 2019-10-20 08:41:35,312 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_BUCKET | ret=SUCCESS |
> 2019-10-20 08:41:35,407 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=ALLOCATE_KEY | ret=SUCCESS |
> 2019-10-20 08:41:37,355 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=COMMIT_KEY | ret=SUCCESS |
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2337) Fix checkstyle errors

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2337?focusedWorklogId=331179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331179
 ]

ASF GitHub Bot logged work on HDDS-2337:


Author: ASF GitHub Bot
Created on: 21/Oct/19 03:57
Start Date: 21/Oct/19 03:57
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #64: HDDS-2337. Fix 
checkstyle errors
URL: https://github.com/apache/hadoop-ozone/pull/64
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331179)
Time Spent: 20m  (was: 10m)

> Fix checkstyle errors
> -
>
> Key: HDDS-2337
> URL: https://issues.apache.org/jira/browse/HDDS-2337
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Checkstyle errors intoduced in HDDS-2281:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt}
> hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java
>  465: Line is longer than 80 characters (found 81).
> hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java
>  244: Line is longer than 80 characters (found 84).
> hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java
>  30: Unused import - 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException.
>  506: ; is preceded with whitespace.
>  517: ; is preceded with whitespace.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2337) Fix checkstyle errors

2019-10-20 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-2337:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks for the contribution [~adoroszlai]. I have committed this.

> Fix checkstyle errors
> -
>
> Key: HDDS-2337
> URL: https://issues.apache.org/jira/browse/HDDS-2337
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Checkstyle errors intoduced in HDDS-2281:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt}
> hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java
>  465: Line is longer than 80 characters (found 81).
> hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java
>  244: Line is longer than 80 characters (found 84).
> hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java
>  30: Unused import - 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException.
>  506: ; is preceded with whitespace.
>  517: ; is preceded with whitespace.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14308) DFSStripedInputStream curStripeBuf is not freed by unbuffer()

2019-10-20 Thread Zhao Yi Ming (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955706#comment-16955706
 ] 

Zhao Yi Ming commented on HDFS-14308:
-

I found a easy way to reproduce this issue through hbase bulkload. And also 
found the root cause. Now we are do the testing. After everything going well. I 
will update this Jira.

> DFSStripedInputStream curStripeBuf is not freed by unbuffer()
> -
>
> Key: HDFS-14308
> URL: https://issues.apache.org/jira/browse/HDFS-14308
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.0.0
>Reporter: Joe McDonnell
>Assignee: Zhao Yi Ming
>Priority: Major
> Attachments: ec_heap_dump.png
>
>
> Some users of HDFS cache opened HDFS file handles to avoid repeated 
> roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file 
> handles by default. Recent tests on erasure coded files show that the open 
> file handles can consume a large amount of memory when not in use.
> For example, here is output from Impala's JMX endpoint when 608 file handles 
> are cached
> {noformat}
> {
> "name": "java.nio:type=BufferPool,name=direct",
> "modelerType": "sun.management.ManagementFactoryHelper$1",
> "Name": "direct",
> "TotalCapacity": 1921048960,
> "MemoryUsed": 1921048961,
> "Count": 633,
> "ObjectName": "java.nio:type=BufferPool,name=direct"
> },{noformat}
> This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
> Attached is output from Eclipse MAT showing that the direct buffers come from 
> DFSStripedInputStream objects. Both Impala and HBase call unbuffer() when a 
> file handle is being cached and potentially unused for significant chunks of 
> time, yet this shows that the memory remains in use.
> To support caching file handles on erasure coded files, DFSStripedInputStream 
> should avoid holding buffers after the unbuffer() call. See HDFS-7694. 
> "unbuffer()" is intended to move an input stream to a lower memory state to 
> support these caching use cases. In particular, the curStripeBuf seems to be 
> allocated from the BUFFER_POOL on a resetCurStripeBuffer(true) call. It is 
> not freed until close().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions

2019-10-20 Thread Mukul Kumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955674#comment-16955674
 ] 

Mukul Kumar Singh commented on HDDS-2336:
-

Thanks for the contribution [~adoroszlai]. I have committed this.

> Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
> 
>
> Key: HDDS-2336
> URL: https://issues.apache.org/jira/browse/HDDS-2336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in 
> HDDS-2283, is failing:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt}
> testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer)
>   Time elapsed: 0.135 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<11>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2336?focusedWorklogId=331133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331133
 ]

ASF GitHub Bot logged work on HDDS-2336:


Author: ASF GitHub Bot
Created on: 21/Oct/19 00:58
Start Date: 21/Oct/19 00:58
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #63: HDDS-2336. Fix 
TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
URL: https://github.com/apache/hadoop-ozone/pull/63
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331133)
Time Spent: 20m  (was: 10m)

> Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
> 
>
> Key: HDDS-2336
> URL: https://issues.apache.org/jira/browse/HDDS-2336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in 
> HDDS-2283, is failing:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt}
> testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer)
>   Time elapsed: 0.135 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<11>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions

2019-10-20 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-2336:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
> 
>
> Key: HDDS-2336
> URL: https://issues.apache.org/jira/browse/HDDS-2336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in 
> HDDS-2283, is failing:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt}
> testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer)
>   Time elapsed: 0.135 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<11>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2019-10-20 Thread guojh (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955660#comment-16955660
 ] 

guojh commented on HDFS-14768:
--

[~surendrasingh] Please review it.

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.jpg, 
> guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, 
> zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());

[jira] [Commented] (HDDS-2328) Support large-scale listing

2019-10-20 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955650#comment-16955650
 ] 

Rajesh Balamohan commented on HDDS-2328:



Here is the small snippet of the code which was used large listing (directory I 
used had millions of entries, which was populated earlier).

ozone src details: https://github.com/apache/hadoop-ozone (commit 
b4a1afd60e3a3c7319a1ffa97d5ace3a95ed26f6).

{noformat}
 // Get path details
...
... 
long sTime = System.currentTimeMillis();
RemoteIterator rit = fs.listLocatedStatus(path);
long count = 0 ;
while(rit.hasNext()) {
  rit.next();
  count++;
}
long eTime = System.currentTimeMillis();
...
...
{noformat}

> Support large-scale listing 
> 
>
> Key: HDDS-2328
> URL: https://issues.apache.org/jira/browse/HDDS-2328
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: performance
>
> Large-scale listing of directory contents takes a lot longer time and also 
> has the potential to run into OOM. I have > 1 million entries in the same 
> level and it took lot longer time with {{RemoteIterator}} (didn't complete as 
> it was stuck in RDB::seek).
> S3A batches it with 5K listing per fetch IIRC.  It would be good to have this 
> feature in ozone as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2324) Enhance locking mechanism in OzoneMangaer

2019-10-20 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955649#comment-16955649
 ] 

Rajesh Balamohan commented on HDDS-2324:


[~arp], fairness is disabled by default.

https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/lock/OzoneManagerLock.java#L95

https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java#L457

> Enhance locking mechanism in OzoneMangaer
> -
>
> Key: HDDS-2324
> URL: https://issues.apache.org/jira/browse/HDDS-2324
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Nanda kumar
>Priority: Major
>  Labels: performance
> Attachments: om_lock_100_percent_read_benchmark.svg, 
> om_lock_reader_and_writer_workload.svg
>
>
> OM has reentrant RW lock. With 100% read or 100% write benchmarks, it works 
> out reasonably fine. There is already a ticket to optimize the write codepath 
> (as it incurs reading from DB for key checks).
> However, when small amount of write workload (e.g 3-5 threads) is added to 
> the running read benchmark, throughput suffers significantly. This is due to 
> the fact that the reader threads would get blocked often.  I have observed 
> around 10x slower throughput (i.e 100% read benchmark was running at 12,000 
> TPS and with couple of writer threads added to it, it goes down to 1200-1800 
> TPS).
> 1. Instead of single write lock, one option could be good to scale out the 
> write lock depending on the number of cores available in the system and 
> acquire relevant lock by hashing the key.
> 2. Another option is to explore if we can make use of StampedLocks of JDK 
> 8.x, which scales well when multiple readers and writers are there. But it is 
> not a reentrant lock. So need to explore whether it can be an option or not.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2331) Client OOME due to buffer retention

2019-10-20 Thread Attila Doroszlai (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955548#comment-16955548
 ] 

Attila Doroszlai commented on HDDS-2331:


Thanks for checking, [~szetszwo].  I probably should have said the bug is 
triggered by HDDS-2169, not caused by it.

I agree, 16MB buffer is overkill for smaller keys.  I tried to change it to 
match actual data length, but it's not trivial (causes other errors).

> Client OOME due to buffer retention
> ---
>
> Key: HDDS-2331
> URL: https://issues.apache.org/jira/browse/HDDS-2331
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Priority: Critical
> Attachments: profiler.png
>
>
> Freon random key generator exhausts default heap after just few hundred 1MB 
> keys.  Heap dump on OOME reveals 150+ instances of 
> {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}.
> Steps to reproduce:
> # Start Ozone cluster with 1 datanode
> # Start Freon (5K keys of size 1MB)
> Result: OOME after a few hundred keys
> {noformat}
> $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
> $ docker-compose up -d
> $ docker-compose exec scm bash
> $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError'
> $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 
> --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 
> --bufferSize 65536
> ...
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid289.hprof ...
> Heap dump file created [1456141975 bytes in 7.760 secs]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2334) Dummy chunk manager fails with length mismatch error

2019-10-20 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2334:
---
Status: Patch Available  (was: In Progress)

> Dummy chunk manager fails with length mismatch error
> 
>
> Key: HDDS-2334
> URL: https://issues.apache.org/jira/browse/HDDS-2334
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDDS-1094 added a config option ({{hdds.container.chunk.persistdata=false}}) 
> to drop chunks instead of writing them to disk.  Currently this option 
> triggers the following error with any key size:
> {noformat}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  data array does not match the length specified. DataLen: 16777216 Byte 
> Array: 16777478
>   at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDummyImpl.writeChunk(ChunkManagerDummyImpl.java:87)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:695)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:176)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:277)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:150)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:413)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:423)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:458)
>   at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2334) Dummy chunk manager fails with length mismatch error

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2334?focusedWorklogId=331081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331081
 ]

ASF GitHub Bot logged work on HDDS-2334:


Author: ASF GitHub Bot
Created on: 20/Oct/19 16:40
Start Date: 20/Oct/19 16:40
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #65: HDDS-2334. 
Dummy chunk manager fails with length mismatch error
URL: https://github.com/apache/hadoop-ozone/pull/65
 
 
   ## What changes were proposed in this pull request?
   
   Data size validation logic was recently 
[changed](https://github.com/apache/hadoop-ozone/commit/e70ea7b66ca3326c3b00ddc3e4af7144d48ea5f5#diff-92341865368a6b82a1430bcb40bd4264R83)
 for real `ChunkManager`, but not for the dummy implementation.  This change 
extracts the validation logic and reuses it for the dummy one, too.  This 
restores the ability to skip writing data to disk (for performance testing).
   
   https://issues.apache.org/jira/browse/HDDS-2334
   
   ## How was this patch tested?
   
   Changed existing unit test to use a buffer with additional "header" at the 
beginning.  Added test cases for dummy implementation.
   
   Tested on compose cluster with the following additional configs:
   
   ```
   OZONE-SITE.XML_hdds.container.chunk.persistdata=false
   OZONE-SITE.XML_ozone.client.verify.checksum=false
   ```
   
   ```
   $ ozone sh volume create vol1
   $ ozone sh bucket create vol1/buck1;
   $ ozone sh key put vol1/buck1/key1 /etc/passwd
   $ ozone sh key get vol1/buck1/key1 asdf
   $ ls -l /etc/passwd
   -rw-r--r-- 1 root root 671 Jun 17 15:33 /etc/passwd
   $ wc asdf
 0   0 671 asdf
   ```
   
   Also tested regular, "persistent" chunk manager:
   
   ```
   $ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1 
--numOfBuckets 1 --replicationType RATIS --factor ONE --validateWrites 
--keySize 1024 --numOfKeys 10 --bufferSize 1024
   ...
   Status: Success
   Git Base Revision: e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
   Number of Volumes created: 1
   Number of Buckets created: 1
   Number of Keys added: 10
   Ratis replication factor: ONE
   Ratis replication type: RATIS
   Average Time spent in volume creation: 00:00:00,182
   Average Time spent in bucket creation: 00:00:00,030
   Average Time spent in key creation: 00:00:00,290
   Average Time spent in key write: 00:00:02,379
   Total bytes written: 10240
   Total number of writes validated: 10
   Writes validated: 100.0 %
   Successful validation: 10
   Unsuccessful validation: 0
   Total Execution time: 00:00:09,389
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331081)
Remaining Estimate: 0h
Time Spent: 10m

> Dummy chunk manager fails with length mismatch error
> 
>
> Key: HDDS-2334
> URL: https://issues.apache.org/jira/browse/HDDS-2334
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDDS-1094 added a config option ({{hdds.container.chunk.persistdata=false}}) 
> to drop chunks instead of writing them to disk.  Currently this option 
> triggers the following error with any key size:
> {noformat}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  data array does not match the length specified. DataLen: 16777216 Byte 
> Array: 16777478
>   at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDummyImpl.writeChunk(ChunkManagerDummyImpl.java:87)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:695)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:176)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:277)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:150)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:413)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:423)
>   at 
> 

[jira] [Updated] (HDDS-2334) Dummy chunk manager fails with length mismatch error

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2334:
-
Labels: pull-request-available  (was: )

> Dummy chunk manager fails with length mismatch error
> 
>
> Key: HDDS-2334
> URL: https://issues.apache.org/jira/browse/HDDS-2334
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> HDDS-1094 added a config option ({{hdds.container.chunk.persistdata=false}}) 
> to drop chunks instead of writing them to disk.  Currently this option 
> triggers the following error with any key size:
> {noformat}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  data array does not match the length specified. DataLen: 16777216 Byte 
> Array: 16777478
>   at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDummyImpl.writeChunk(ChunkManagerDummyImpl.java:87)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:695)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:176)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:277)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:150)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:413)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:423)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:458)
>   at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955517#comment-16955517
 ] 

Hadoop QA commented on HDFS-14882:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 68 unchanged - 1 fixed = 68 total (was 69) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}145m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14882 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983544/HDFS-14882.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ceaec273d07f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 447f46d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28134/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28134/testReport/ |
| Max. process+thread count | 4076 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| 

[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2019-10-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955509#comment-16955509
 ] 

Hadoop QA commented on HDFS-14768:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
58s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 170 unchanged - 1 fixed = 170 total (was 171) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}143m 42s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}218m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestQuotaByStorageType |
|   | hadoop.hdfs.server.namenode.TestAddStripedBlocks |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.server.namenode.TestUpgradeDomainBlockPlacementPolicy |
|   | hadoop.hdfs.server.namenode.TestListCorruptFileBlocks |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.namenode.TestCacheDirectives |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14768 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983541/HDFS-14768.007.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9dfa0cd5375e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 447f46d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Comment Edited] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955493#comment-16955493
 ] 

Ayush Saxena edited comment on HDFS-14882 at 10/20/19 1:27 PM:
---

Thanx [~hexiaoqiao]
IMO we shouldn't use the same configuration, someone turning on the old 
configuration, will now after this will turn this feature on too, which doesn't 
use to happen earlier.
In general for anything new we usually keep the new feature turned off by 
default and I see the default for the config is true. I don't think we should 
force people into using this by default, since sorting too has some performance 
impact, So I would prefer them turning this on explicitly.
Though things are quiet similar but not for same thing, I think we should have 
a separate config.

Moreover for the test, You may add a case having decommissioned or stale 
datanodes and verify the case that they stay at end irrespective of the 
distance.


was (Author: ayushtkn):
Thanx [~hexiaoqiao]
IMO we shouldn't use the same configuration, someone turning on the old 
configuration, will now after this will turn this feature on too, which doesn't 
use to happen earlier.
In general for anything new we usually keep the new feature turned off by 
default and I see the default for the config is true. I don't think we should 
force people into using this by default, since sorting too has some performance 
impact, So I would prefer them turning this on explicitly.
Though things are quiet similar but not for same thing, I think we should have 
a seperate config.

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955493#comment-16955493
 ] 

Ayush Saxena commented on HDFS-14882:
-

Thanx [~hexiaoqiao]
IMO we shouldn't use the same configuration, someone turning on the old 
configuration, will now after this will turn this feature on too, which doesn't 
use to happen earlier.
In general for anything new we usually keep the new feature turned off by 
default and I see the default for the config is true. I don't think we should 
force people into using this by default, since sorting too has some performance 
impact, So I would prefer them turning this on explicitly.
Though things are quiet similar but not for same thing, I think we should have 
a seperate config.

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-20 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955487#comment-16955487
 ] 

Xiaoqiao He commented on HDFS-14882:


Thanks [~ayushtkn] for your reviews. I have addressed all the latest comments, 
I think.
{quote}You even need to add the new config in Hdfs-defaults.xml
{quote}
I try to reused the config which has used by 
BlockPlacementPolicyDefault#isGoodDatanode. IMO, they both follow the same 
semantic, so I think we do not need to add another one.
I would like to give a brief introduction for this changes.
Generally, this patch try to re-order nodes who have same topology distance to 
client and based on Load descend order.
Firstly, calculate all distances from nodes to client.
Then, try to use #start and #end to embrace nodes with the same distances.
Thirdly, re-sort by load descend order. (Note: Skip it when section length is 
less than 2 since it is unnecessary to sort one node.)

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2319) CLI command to perform on-demand data scan of a specific container

2019-10-20 Thread YiSheng Lien (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YiSheng Lien reassigned HDDS-2319:
--

Assignee: YiSheng Lien

> CLI command to perform on-demand data scan of a specific container
> --
>
> Key: HDDS-2319
> URL: https://issues.apache.org/jira/browse/HDDS-2319
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone CLI
>Reporter: Attila Doroszlai
>Assignee: YiSheng Lien
>Priority: Major
>
> On-demand data scan for a specific container might be a useful debug tool.  
> Thanks [~aengineer] for the idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future

2019-10-20 Thread YiSheng Lien (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955482#comment-16955482
 ] 

YiSheng Lien commented on HDDS-2332:


Hi [~ljain], Would you show the condition to reproduce the issue ?

> BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
> ---
>
> Key: HDDS-2332
> URL: https://issues.apache.org/jira/browse/HDDS-2332
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Lokesh Jain
>Priority: Major
>
> BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that 
> the thread is blocked on the same condition.
> {code:java}
> 2019-10-18 06:30:38
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   - locked <0xa6a75930> (a 
> org.apache.hadoop.fs.FSDataOutputStream)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77)
>   - locked <0xa6a75918> (a 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2019-10-18 07:02:50
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> 

[jira] [Updated] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-20 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-14882:
---
Attachment: HDFS-14882.003.patch

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2337) Fix checkstyle errors

2019-10-20 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2337:
---
Status: Patch Available  (was: Open)

> Fix checkstyle errors
> -
>
> Key: HDDS-2337
> URL: https://issues.apache.org/jira/browse/HDDS-2337
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Checkstyle errors intoduced in HDDS-2281:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt}
> hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java
>  465: Line is longer than 80 characters (found 81).
> hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java
>  244: Line is longer than 80 characters (found 84).
> hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java
>  30: Unused import - 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException.
>  506: ; is preceded with whitespace.
>  517: ; is preceded with whitespace.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions

2019-10-20 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2336:
---
Status: Patch Available  (was: In Progress)

> Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
> 
>
> Key: HDDS-2336
> URL: https://issues.apache.org/jira/browse/HDDS-2336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in 
> HDDS-2283, is failing:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt}
> testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer)
>   Time elapsed: 0.135 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<11>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2337) Fix checkstyle errors

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2337:
-
Labels: pull-request-available  (was: )

> Fix checkstyle errors
> -
>
> Key: HDDS-2337
> URL: https://issues.apache.org/jira/browse/HDDS-2337
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> Checkstyle errors intoduced in HDDS-2281:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt}
> hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java
>  465: Line is longer than 80 characters (found 81).
> hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java
>  244: Line is longer than 80 characters (found 84).
> hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java
>  30: Unused import - 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException.
>  506: ; is preceded with whitespace.
>  517: ; is preceded with whitespace.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2337) Fix checkstyle errors

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2337?focusedWorklogId=331052=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331052
 ]

ASF GitHub Bot logged work on HDDS-2337:


Author: ASF GitHub Bot
Created on: 20/Oct/19 11:48
Start Date: 20/Oct/19 11:48
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #64: HDDS-2337. 
Fix checkstyle errors
URL: https://github.com/apache/hadoop-ozone/pull/64
 
 
   ## What changes were proposed in this pull request?
   
   Fix current [checkstyle 
errors](https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt).
   
   Also:
   
* fix some log message placeholder vs. parameter count mismatch.
* remove `NoSuchAlgorithmException` from javadoc where not declared to be 
thrown
   
   https://issues.apache.org/jira/browse/HDDS-2337
   
   ## How was this patch tested?
   
   Ran checkstyle.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331052)
Remaining Estimate: 0h
Time Spent: 10m

> Fix checkstyle errors
> -
>
> Key: HDDS-2337
> URL: https://issues.apache.org/jira/browse/HDDS-2337
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Checkstyle errors intoduced in HDDS-2281:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt}
> hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java
>  465: Line is longer than 80 characters (found 81).
> hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java
>  244: Line is longer than 80 characters (found 84).
> hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java
>  30: Unused import - 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException.
>  506: ; is preceded with whitespace.
>  517: ; is preceded with whitespace.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2337) Fix checkstyle errors

2019-10-20 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2337:
--

 Summary: Fix checkstyle errors
 Key: HDDS-2337
 URL: https://issues.apache.org/jira/browse/HDDS-2337
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


Checkstyle errors intoduced in HDDS-2281:

{noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2281-wfpgn/checkstyle/summary.txt}
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java
 465: Line is longer than 80 characters (found 81).
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java
 244: Line is longer than 80 characters (found 84).
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java
 30: Unused import - 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException.
 506: ; is preceded with whitespace.
 517: ; is preceded with whitespace.
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2336:
-
Labels: pull-request-available  (was: )

> Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
> 
>
> Key: HDDS-2336
> URL: https://issues.apache.org/jira/browse/HDDS-2336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in 
> HDDS-2283, is failing:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt}
> testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer)
>   Time elapsed: 0.135 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<11>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2336?focusedWorklogId=331051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331051
 ]

ASF GitHub Bot logged work on HDDS-2336:


Author: ASF GitHub Bot
Created on: 20/Oct/19 11:37
Start Date: 20/Oct/19 11:37
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #63: HDDS-2336. 
Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
URL: https://github.com/apache/hadoop-ozone/pull/63
 
 
   ## What changes were proposed in this pull request?
   
   `testRocksDBCreateUsesCachedOptions` is 
[failing](https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt)
 because each test method in `TestKeyValueContainer` introduces a new entry in 
`MetadataStoreBuilder.CACHED_OPTS`, since `Configuration` does not implement 
`equals`.  Thus `testRocksDBCreateUsesCachedOptions` passes by itself, but 
fails when the whole test class is run.
   
   https://issues.apache.org/jira/browse/HDDS-2336
   
   ## How was this patch tested?
   
   Unit test.  No other code changed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331051)
Remaining Estimate: 0h
Time Spent: 10m

> Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
> 
>
> Key: HDDS-2336
> URL: https://issues.apache.org/jira/browse/HDDS-2336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in 
> HDDS-2283, is failing:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt}
> testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer)
>   Time elapsed: 0.135 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<11>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions

2019-10-20 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2336 started by Attila Doroszlai.
--
> Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
> 
>
> Key: HDDS-2336
> URL: https://issues.apache.org/jira/browse/HDDS-2336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>
> TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in 
> HDDS-2283, is failing:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt}
> testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer)
>   Time elapsed: 0.135 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<11>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2336) Fix TestKeyValueContainer#testRocksDBCreateUsesCachedOptions

2019-10-20 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2336:
--

 Summary: Fix 
TestKeyValueContainer#testRocksDBCreateUsesCachedOptions
 Key: HDDS-2336
 URL: https://issues.apache.org/jira/browse/HDDS-2336
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Affects Versions: 0.5.0
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


TestKeyValueContainer#testRocksDBCreateUsesCachedOptions, introduced in 
HDDS-2283, is failing:

{noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/pr/pr-hdds-2283-cnrrq/unit/hadoop-hdds/container-service/org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.txt}
testRocksDBCreateUsesCachedOptions(org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer)
  Time elapsed: 0.135 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<11>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer.testRocksDBCreateUsesCachedOptions(TestKeyValueContainer.java:406)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2280) HddsUtils#CheckForException should not return null in case the ratis exception cause is not set

2019-10-20 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2280.
-
Fix Version/s: 0.5.0
   Resolution: Fixed

Thanks for the contribution [~shashikant] and [~bharat] for the review. I have 
committed this.

> HddsUtils#CheckForException should not return null in case the ratis 
> exception cause is not set
> ---
>
> Key: HDDS-2280
> URL: https://issues.apache.org/jira/browse/HDDS-2280
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HddsUtils#CheckForException checks for the cause to be set properly to one of 
> the defined/expected exceptions. In case, ratis throws up any runtime 
> exception, HddsUtils#CheckForException can return null and lead to 
> NullPointerException while write.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2280) HddsUtils#CheckForException should not return null in case the ratis exception cause is not set

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2280?focusedWorklogId=331050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331050
 ]

ASF GitHub Bot logged work on HDDS-2280:


Author: ASF GitHub Bot
Created on: 20/Oct/19 11:20
Start Date: 20/Oct/19 11:20
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #57: HDDS-2280. 
HddsUtils#CheckForException should not return null in case the ratis exception 
cause is not set 
URL: https://github.com/apache/hadoop-ozone/pull/57
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331050)
Time Spent: 40m  (was: 0.5h)

> HddsUtils#CheckForException should not return null in case the ratis 
> exception cause is not set
> ---
>
> Key: HDDS-2280
> URL: https://issues.apache.org/jira/browse/HDDS-2280
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HddsUtils#CheckForException checks for the cause to be set properly to one of 
> the defined/expected exceptions. In case, ratis throws up any runtime 
> exception, HddsUtils#CheckForException can return null and lead to 
> NullPointerException while write.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2281) ContainerStateMachine#handleWriteChunk should ignore close container exception

2019-10-20 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2281.
-
Resolution: Fixed

Thanks for the contribution [~shashikant]. I have committed this.

> ContainerStateMachine#handleWriteChunk should ignore close container 
> exception 
> ---
>
> Key: HDDS-2281
> URL: https://issues.apache.org/jira/browse/HDDS-2281
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, ContainerStateMachine#applyTrannsaction ignores close container 
> exception.Similarly,ContainerStateMachine#handleWriteChunk call also should 
> ignore close container exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2281) ContainerStateMachine#handleWriteChunk should ignore close container exception

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2281?focusedWorklogId=331049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331049
 ]

ASF GitHub Bot logged work on HDDS-2281:


Author: ASF GitHub Bot
Created on: 20/Oct/19 11:16
Start Date: 20/Oct/19 11:16
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #54: HDDS-2281. 
ContainerStateMachine#handleWriteChunk should ignore close container exception
URL: https://github.com/apache/hadoop-ozone/pull/54
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331049)
Time Spent: 1h 20m  (was: 1h 10m)

> ContainerStateMachine#handleWriteChunk should ignore close container 
> exception 
> ---
>
> Key: HDDS-2281
> URL: https://issues.apache.org/jira/browse/HDDS-2281
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, ContainerStateMachine#applyTrannsaction ignores close container 
> exception.Similarly,ContainerStateMachine#handleWriteChunk call also should 
> ignore close container exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13736) BlockPlacementPolicyDefault can not choose favored nodes when 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false

2019-10-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955449#comment-16955449
 ] 

Ayush Saxena commented on HDFS-13736:
-

Thanx [~xiaodong.hu] for the patch.
I think the introduced test failed, You need to check it once.
bq.   If just add a parameter to chooseLocalStorage to denote it, I think  lots 
of places should be modified. 

I tried the approach, I don't think there are two places to tweek and in the 
end it landed up quite a less number of lines than the present patch. I think 
adding parameter would be a cleaner approach. If you are using an IDE, you can 
use the refactor option to add new param to the method, it shall automatically 
update all the places using it, with the default value passed.
Let me know for any help, if you are facing any trouble..

> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false
> --
>
> Key: HDFS-13736
> URL: https://issues.apache.org/jira/browse/HDFS-13736
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: hu xiaodong
>Assignee: hu xiaodong
>Priority: Major
> Attachments: HDFS-13736.001.patch, HDFS-13736.002.patch
>
>
> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2019-10-20 Thread guojh (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955446#comment-16955446
 ] 

guojh commented on HDFS-14768:
--

rebase

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.jpg, 
> guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, 
> zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   

[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2019-10-20 Thread guojh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guojh updated HDFS-14768:
-
Attachment: HDFS-14768.007.patch

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.jpg, 
> guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, 
> zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   

[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955444#comment-16955444
 ] 

Hadoop QA commented on HDFS-14882:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 68 unchanged - 1 fixed = 70 total (was 69) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m  1s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}167m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestSaveNamespace |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14882 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983540/HDFS-14882.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7a17ddad9fba 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 447f46d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28132/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28132/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Commented] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java

2019-10-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955439#comment-16955439
 ] 

Ayush Saxena commented on HDFS-14860:
-

Thanx [~belugabehr] for the patch. Overall Looks good. Minor doubt with this 
change :

{code:java}

233   private void clearPathIds() {
234 final Collection paths = new ArrayList<>();
235 pathsToBeTraveresed.drainTo(paths);
236 for (Long trackId : paths) {
237   try {
238 namesystem.removeXattr(trackId,
239 HdfsServerConstants.XATTR_SATISFY_STORAGE_POLICY);
240   } catch (IOException e) {
241 LOG.debug("Failed to remove sps xatttr!", e);
258   } 242   }
{code}

Can't we do this. instead creating new arrayList :

{code:java}
  private void clearPathIds() {
while (!pathsToBeTraveresed.isEmpty())
  try {
namesystem.removeXattr(pathsToBeTraveresed.remove(),
HdfsServerConstants.XATTR_SATISFY_STORAGE_POLICY);
  } catch (IOException e) {
LOG.debug("Failed to remove sps xatttr!", e);
  }
  }
{code}


> Clean Up StoragePolicySatisfyManager.java
> -
>
> Key: HDFS-14860
> URL: https://issues.apache.org/jira/browse/HDFS-14860
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch, 
> HDFS-14860.3.patch
>
>
> * Remove superfluous debug log guards
> * Use {{java.util.concurrent}} package for internal structure instead of 
> external synchronization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2019-10-20 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955438#comment-16955438
 ] 

Surendra Singh Lilhore commented on HDFS-14768:
---

[~gjhkael], again re-base is required.  HDFS-14847 committed to trunk.

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.jpg, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   

[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955433#comment-16955433
 ] 

Ayush Saxena commented on HDFS-14882:
-

Thanx [~hexiaoqiao]  for the patch. The overall idea looks good. One doubt in 
the following logic, if you can help me understand :

{code:java}
for (int start = 0, end = 0; start < activeLen & end < activeLen;) {
  if (distances[start] == distances[end]) {
end = end + 1;
if (end < activeLen) continue;
  }
  Arrays.sort(datanodes, start, end,
  Comparator.comparingInt(DatanodeInfo::getXceiverCount));
  start = end;
  end = end + 1;
}
}{code}

* The first iteration would be start=0 and end=0; the {{ if (distances[start] 
== distances[end])}} will always be true, why I don't start with end as 1?
* now at second iteration; start=0 and end=1; distance[0]!= distance[1], 
condition tends to be false; then why do I execute {{ Arrays.sort(datanodes, 
start, end,
  Comparator.comparingInt(DatanodeInfo::getXceiverCount));}}
* Third iter; start = 1; end= 2; if distance isn't equal again we do  {{ 
Arrays.sort(datanodes, start, end,
  Comparator.comparingInt(DatanodeInfo::getXceiverCount));}}

Well I need to recheck this logic
You even need to add the new config in {{Hdfs-defaults.xml}}
Apart almost LGTM


> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2335) Params not included in AuditMessage

2019-10-20 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2335:
---
Status: Patch Available  (was: In Progress)

> Params not included in AuditMessage
> ---
>
> Key: HDDS-2335
> URL: https://issues.apache.org/jira/browse/HDDS-2335
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDDS-2323 introduced the following Findbugs violation:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt}
> M P UrF: Unread field: 
> org.apache.hadoop.ozone.audit.AuditMessage$Builder.params  At 
> AuditMessage.java:[line 106]
> {noformat}
> Which reveals that {{params}} is now not logged in audit messages:
> {noformat}
> 2019-10-20 08:41:35,248 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_VOLUME | ret=SUCCESS |
> 2019-10-20 08:41:35,312 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_BUCKET | ret=SUCCESS |
> 2019-10-20 08:41:35,407 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=ALLOCATE_KEY | ret=SUCCESS |
> 2019-10-20 08:41:37,355 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=COMMIT_KEY | ret=SUCCESS |
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2335) Params not included in AuditMessage

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2335:
-
Labels: pull-request-available  (was: )

> Params not included in AuditMessage
> ---
>
> Key: HDDS-2335
> URL: https://issues.apache.org/jira/browse/HDDS-2335
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> HDDS-2323 introduced the following Findbugs violation:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt}
> M P UrF: Unread field: 
> org.apache.hadoop.ozone.audit.AuditMessage$Builder.params  At 
> AuditMessage.java:[line 106]
> {noformat}
> Which reveals that {{params}} is now not logged in audit messages:
> {noformat}
> 2019-10-20 08:41:35,248 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_VOLUME | ret=SUCCESS |
> 2019-10-20 08:41:35,312 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_BUCKET | ret=SUCCESS |
> 2019-10-20 08:41:35,407 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=ALLOCATE_KEY | ret=SUCCESS |
> 2019-10-20 08:41:37,355 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=COMMIT_KEY | ret=SUCCESS |
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2335) Params not included in AuditMessage

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2335?focusedWorklogId=331040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331040
 ]

ASF GitHub Bot logged work on HDDS-2335:


Author: ASF GitHub Bot
Created on: 20/Oct/19 09:05
Start Date: 20/Oct/19 09:05
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #62: HDDS-2335. 
Params not included in AuditMessage
URL: https://github.com/apache/hadoop-ozone/pull/62
 
 
   ## What changes were proposed in this pull request?
   
   Include operation parameters in audit messages like before 
[HDDS-2323](https://github.com/apache/hadoop-ozone/commit/b9618834c9902fc8fd9ae12872092cfb1e5c1be3).
   
   https://issues.apache.org/jira/browse/HDDS-2335
   
   ## How was this patch tested?
   
   Added unit test.
   
   Verified Findbugs violation is fixed.
   
   Tested using Freon:
   
   ```
   2019-10-20 08:54:19,437 | INFO  | OMAudit | user=hadoop | ip=192.168.144.3 | 
op=CREATE_VOLUME {admin=hadoop, owner=hadoop, volume=vol-0-52867, 
creationTime=1571561659397, quotaInBytes=1152921504606846976, objectID=1, 
updateID=1} | ret=SUCCESS |
   2019-10-20 08:54:19,497 | INFO  | OMAudit | user=hadoop | ip=192.168.144.3 | 
op=CREATE_BUCKET {volume=vol-0-52867, bucket=bucket-0-43473, gdprEnabled=null, 
acls=[user:hadoop:a[ACCESS], group:users:a[ACCESS]], isVersionEnabled=false, 
storageType=DISK, creationTime=1571561659483} | ret=SUCCESS |
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331040)
Remaining Estimate: 0h
Time Spent: 10m

> Params not included in AuditMessage
> ---
>
> Key: HDDS-2335
> URL: https://issues.apache.org/jira/browse/HDDS-2335
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDDS-2323 introduced the following Findbugs violation:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt}
> M P UrF: Unread field: 
> org.apache.hadoop.ozone.audit.AuditMessage$Builder.params  At 
> AuditMessage.java:[line 106]
> {noformat}
> Which reveals that {{params}} is now not logged in audit messages:
> {noformat}
> 2019-10-20 08:41:35,248 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_VOLUME | ret=SUCCESS |
> 2019-10-20 08:41:35,312 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_BUCKET | ret=SUCCESS |
> 2019-10-20 08:41:35,407 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=ALLOCATE_KEY | ret=SUCCESS |
> 2019-10-20 08:41:37,355 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=COMMIT_KEY | ret=SUCCESS |
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2019-10-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955428#comment-16955428
 ] 

Ayush Saxena commented on HDFS-14283:
-

Thanx [~leosun08] for the patch. Had a quick look on the idea, Couldn't check 
the whole code, Some concerns :
 * I think the feature to prefer cached Replica should be optional and governed 
by a config at the client side, whether he wants it or not.
 * Secondly, The changes have moved to the server side too, for the sorting 
stuff. I think this would have performance impact for those who even don't want 
to prefer the cached locations. The intent with which this Jira started was to 
keep the logic down at client side, So I think we should refrain from changes 
at server side.
 * Even make sure those not interested of using cached Replica, should not get 
affected by any means all the process work for this should be only done, if 
this feature is turned, which by default should be turned off.

 

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2019-10-20 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955426#comment-16955426
 ] 

Xiaoqiao He commented on HDFS-14283:


Thanks [~leosun08] for your works.
{quote}But i have a problem that current block.getLocations() which gets a list 
of DataNodes in priority order does not consider choosed DN LOAD, bandwidth 
etc. I think it is necessary to add this logic later.{quote}
HDFS-14882 is working now, it is very pleasure if you are interested to review 
it?
For this ticket, I am concerned about which one should be given priority 
between distance and cached. Or leave the option to user?
Consider the following case, 3 replicas (names ra, rb, rc respectively) of one 
block, and set cache replicas number to 2 which combine with rb and rc. then 
another client which topology distance is more near to host which ra located at 
(one corner case is that the client from the same host which ra located at) 
rather than hosts rb/rc located. Then which one host should the client request 
to firstly. I believe both ra or rb/rc is reasonable. [^HDFS-14283.003.patch] 
seems to choose cache priority policy, right? I just suggest maybe it is better 
to leave the choice to user.

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2334) Dummy chunk manager fails with length mismatch error

2019-10-20 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2334 started by Attila Doroszlai.
--
> Dummy chunk manager fails with length mismatch error
> 
>
> Key: HDDS-2334
> URL: https://issues.apache.org/jira/browse/HDDS-2334
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>
> HDDS-1094 added a config option ({{hdds.container.chunk.persistdata=false}}) 
> to drop chunks instead of writing them to disk.  Currently this option 
> triggers the following error with any key size:
> {noformat}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  data array does not match the length specified. DataLen: 16777216 Byte 
> Array: 16777478
>   at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDummyImpl.writeChunk(ChunkManagerDummyImpl.java:87)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:695)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:176)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:277)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:150)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:413)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:423)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:458)
>   at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2335) Params not included in AuditMessage

2019-10-20 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2335 started by Attila Doroszlai.
--
> Params not included in AuditMessage
> ---
>
> Key: HDDS-2335
> URL: https://issues.apache.org/jira/browse/HDDS-2335
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>
> HDDS-2323 introduced the following Findbugs violation:
> {noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt}
> M P UrF: Unread field: 
> org.apache.hadoop.ozone.audit.AuditMessage$Builder.params  At 
> AuditMessage.java:[line 106]
> {noformat}
> Which reveals that {{params}} is now not logged in audit messages:
> {noformat}
> 2019-10-20 08:41:35,248 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_VOLUME | ret=SUCCESS |
> 2019-10-20 08:41:35,312 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=CREATE_BUCKET | ret=SUCCESS |
> 2019-10-20 08:41:35,407 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=ALLOCATE_KEY | ret=SUCCESS |
> 2019-10-20 08:41:37,355 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
> op=COMMIT_KEY | ret=SUCCESS |
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2335) Params not included in AuditMessage

2019-10-20 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2335:
--

 Summary: Params not included in AuditMessage
 Key: HDDS-2335
 URL: https://issues.apache.org/jira/browse/HDDS-2335
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


HDDS-2323 introduced the following Findbugs violation:

{noformat:title=https://github.com/elek/ozone-ci-q4/blob/master/trunk/trunk-nightly-20191020-r5wzl/findbugs/summary.txt}
M P UrF: Unread field: 
org.apache.hadoop.ozone.audit.AuditMessage$Builder.params  At 
AuditMessage.java:[line 106]
{noformat}

Which reveals that {{params}} is now not logged in audit messages:

{noformat}
2019-10-20 08:41:35,248 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
op=CREATE_VOLUME | ret=SUCCESS |
2019-10-20 08:41:35,312 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
op=CREATE_BUCKET | ret=SUCCESS |
2019-10-20 08:41:35,407 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
op=ALLOCATE_KEY | ret=SUCCESS |
2019-10-20 08:41:37,355 | INFO  | OMAudit | user=hadoop | ip=192.168.128.2 | 
op=COMMIT_KEY | ret=SUCCESS |
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2334) Dummy chunk manager fails with length mismatch error

2019-10-20 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2334:
--

 Summary: Dummy chunk manager fails with length mismatch error
 Key: HDDS-2334
 URL: https://issues.apache.org/jira/browse/HDDS-2334
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


HDDS-1094 added a config option ({{hdds.container.chunk.persistdata=false}}) to 
drop chunks instead of writing them to disk.  Currently this option triggers 
the following error with any key size:

{noformat}
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
data array does not match the length specified. DataLen: 16777216 Byte Array: 
16777478
at 
org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDummyImpl.writeChunk(ChunkManagerDummyImpl.java:87)
at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:695)
at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:176)
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:277)
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:150)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:413)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:423)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:458)
at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-20 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955423#comment-16955423
 ] 

Xiaoqiao He commented on HDFS-14882:


[^HDFS-14882.002.patch] fix checkstyle and try to trigger Jenkins again.
Hi watchers, anyone would like to help take another review?

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-20 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-14882:
---
Attachment: HDFS-14882.002.patch

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.

2019-10-20 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2283.
-
Resolution: Fixed

> Container creation on datanodes take time because of Rocksdb option creation.
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-2283.00.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.
> Creating a rocksdb per disk should be enough and each container can be table 
> inside the rocksdb.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2311) Fix logic of RetryPolicy in OzoneClientSideTranslatorPB

2019-10-20 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh reassigned HDDS-2311:
---

Assignee: Hanisha Koneru

> Fix logic of RetryPolicy in OzoneClientSideTranslatorPB
> ---
>
> Key: HDDS-2311
> URL: https://issues.apache.org/jira/browse/HDDS-2311
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Hanisha Koneru
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> OzoneManagerProtocolClientSideTranslatorPB.java
> L251: if (cause instanceof NotLeaderException) {
>  NotLeaderException notLeaderException = (NotLeaderException) cause;
>  omFailoverProxyProvider.performFailoverIfRequired(
>  notLeaderException.getSuggestedLeaderNodeId());
>  return getRetryAction(RetryAction.RETRY, retries, failovers);
>  }
>  
> The suggested leader returned from Server is not used during failOver, as the 
> cause is a type of RemoteException. So with current code, it does not use 
> suggested leader for failOver at all and by default with each OM, it tries 
> max retries.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.

2019-10-20 Thread Mukul Kumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955410#comment-16955410
 ] 

Mukul Kumar Singh commented on HDDS-2283:
-

Thanks for the contribution [~swagle] and [~avijayan] and [~aengineer] for the 
review. I have committed this.

> Container creation on datanodes take time because of Rocksdb option creation.
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-2283.00.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.
> Creating a rocksdb per disk should be enough and each container can be table 
> inside the rocksdb.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.

2019-10-20 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-2283:

Summary: Container creation on datanodes take time because of Rocksdb 
option creation.  (was: Container Creation on datanodes take around 300ms due 
to rocksdb creation)

> Container creation on datanodes take time because of Rocksdb option creation.
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-2283.00.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.
> Creating a rocksdb per disk should be enough and each container can be table 
> inside the rocksdb.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2283) Container creation on datanodes take time because of Rocksdb option creation.

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2283?focusedWorklogId=331035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331035
 ]

ASF GitHub Bot logged work on HDDS-2283:


Author: ASF GitHub Bot
Created on: 20/Oct/19 06:42
Start Date: 20/Oct/19 06:42
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #41: HDDS-2283. 
Container Creation on datanodes take around 300ms due to rocksdb creation.
URL: https://github.com/apache/hadoop-ozone/pull/41
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331035)
Time Spent: 20m  (was: 10m)

> Container creation on datanodes take time because of Rocksdb option creation.
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-2283.00.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.
> Creating a rocksdb per disk should be enough and each container can be table 
> inside the rocksdb.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2286) Add a log info in ozone client and scm to print the exclusion list during allocate block

2019-10-20 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDDS-2286.
-
Resolution: Fixed

Thanks for the contribution [~swagle] and [~adoroszlai] for the review. I have 
committed this.

> Add a log info in ozone client and scm to print the exclusion list during 
> allocate block
> 
>
> Key: HDDS-2286
> URL: https://issues.apache.org/jira/browse/HDDS-2286
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2286) Add a log info in ozone client and scm to print the exclusion list during allocate block

2019-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2286?focusedWorklogId=331032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-331032
 ]

ASF GitHub Bot logged work on HDDS-2286:


Author: ASF GitHub Bot
Created on: 20/Oct/19 06:26
Start Date: 20/Oct/19 06:26
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #46: HDDS-2286. Add 
a log info in ozone client and scm to print the exclus…
URL: https://github.com/apache/hadoop-ozone/pull/46
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 331032)
Time Spent: 20m  (was: 10m)

> Add a log info in ozone client and scm to print the exclusion list during 
> allocate block
> 
>
> Key: HDDS-2286
> URL: https://issues.apache.org/jira/browse/HDDS-2286
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org