[jira] [Updated] (HDDS-2313) Duplicate release of lock in OMKeyCommitRequest

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2313:
-
Labels: pull-request-available  (was: )

> Duplicate release of lock in OMKeyCommitRequest
> ---
>
> Key: HDDS-2313
> URL: https://issues.apache.org/jira/browse/HDDS-2313
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Blocker
>  Labels: pull-request-available
>
> {noformat}
> om_1| 2019-10-16 05:33:57,413 [IPC Server handler 19 on 9862] ERROR   
>- Trying to release the lock on /bypdd/mybucket4, which was never acquired.
> om_1| 2019-10-16 05:33:57,414 WARN ipc.Server: IPC Server handler 19 
> on 9862, call Call#4 Retry#8 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 
> 172.29.0.4:37018
> om_1| java.lang.IllegalMonitorStateException: Releasing lock on 
> resource /bypdd/mybucket4 without acquiring lock
> om_1| at 
> org.apache.hadoop.ozone.lock.LockManager.getLockForReleasing(LockManager.java:220)
> om_1| at 
> org.apache.hadoop.ozone.lock.LockManager.release(LockManager.java:168)
> om_1| at 
> org.apache.hadoop.ozone.lock.LockManager.writeUnlock(LockManager.java:148)
> om_1| at 
> org.apache.hadoop.ozone.om.lock.OzoneManagerLock.unlock(OzoneManagerLock.java:364)
> om_1| at 
> org.apache.hadoop.ozone.om.lock.OzoneManagerLock.releaseWriteLock(OzoneManagerLock.java:329)
> om_1| at 
> org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequest.validateAndUpdateCache(OMKeyCommitRequest.java:177)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2313) Duplicate release of lock in OMKeyCommitRequest

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2313?focusedWorklogId=328949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328949
 ]

ASF GitHub Bot logged work on HDDS-2313:


Author: ASF GitHub Bot
Created on: 16/Oct/19 05:52
Start Date: 16/Oct/19 05:52
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #35: HDDS-2313. 
Duplicate release of lock in OMKeyCommitRequest
URL: https://github.com/apache/hadoop-ozone/pull/35
 
 
   ## What changes were proposed in this pull request?
   
   Fix duplicate release of lock (apparently a merge issue, the original change 
(#24) was fine), which causes acceptance test failures:
   
   ```
   ozone-basic :: Smoketest ozone cluster startup
   
==
   Check webui static resources  | PASS 
|
   
--
   Start freon testing   | FAIL 
|
   255 != 0
   
--
   ozone-basic :: Smoketest ozone cluster startup| FAIL 
|
   2 critical tests, 1 passed, 1 failed
   2 tests total, 1 passed, 1 failed
   ```
   
   https://issues.apache.org/jira/browse/HDDS-2313
   
   ## How was this patch tested?
   
   Ran `ozone` acceptance test.
   
   ```
   $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
   $ ./test.sh
   ...
   ozone-basic :: Smoketest ozone cluster startup
   
==
   Check webui static resources  | PASS 
|
   
--
   Start freon testing   | PASS 
|
   
--
   ozone-basic :: Smoketest ozone cluster startup| PASS 
|
   2 critical tests, 2 passed, 0 failed
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328949)
Remaining Estimate: 0h
Time Spent: 10m

> Duplicate release of lock in OMKeyCommitRequest
> ---
>
> Key: HDDS-2313
> URL: https://issues.apache.org/jira/browse/HDDS-2313
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {noformat}
> om_1| 2019-10-16 05:33:57,413 [IPC Server handler 19 on 9862] ERROR   
>- Trying to release the lock on /bypdd/mybucket4, which was never acquired.
> om_1| 2019-10-16 05:33:57,414 WARN ipc.Server: IPC Server handler 19 
> on 9862, call Call#4 Retry#8 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 
> 172.29.0.4:37018
> om_1| java.lang.IllegalMonitorStateException: Releasing lock on 
> resource /bypdd/mybucket4 without acquiring lock
> om_1| at 
> org.apache.hadoop.ozone.lock.LockManager.getLockForReleasing(LockManager.java:220)
> om_1| at 
> org.apache.hadoop.ozone.lock.LockManager.release(LockManager.java:168)
> om_1| at 
> org.apache.hadoop.ozone.lock.LockManager.writeUnlock(LockManager.java:148)
> om_1| at 
> org.apache.hadoop.ozone.om.lock.OzoneManagerLock.unlock(OzoneManagerLock.java:364)
> om_1| at 
> org.apache.hadoop.ozone.om.lock.OzoneManagerLock.releaseWriteLock(OzoneManagerLock.java:329)
> om_1| at 
> org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequest.validateAndUpdateCache(OMKeyCommitRequest.java:177)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2286) Add a log info in ozone client and scm to print the exclusion list during allocate block

2019-10-15 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle reassigned HDDS-2286:
-

Assignee: Siddharth Wagle

> Add a log info in ozone client and scm to print the exclusion list during 
> allocate block
> 
>
> Key: HDDS-2286
> URL: https://issues.apache.org/jira/browse/HDDS-2286
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Siddharth Wagle
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2313) Duplicate release of lock in OMKeyCommitRequest

2019-10-15 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2313 started by Attila Doroszlai.
--
> Duplicate release of lock in OMKeyCommitRequest
> ---
>
> Key: HDDS-2313
> URL: https://issues.apache.org/jira/browse/HDDS-2313
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Blocker
>
> {noformat}
> om_1| 2019-10-16 05:33:57,413 [IPC Server handler 19 on 9862] ERROR   
>- Trying to release the lock on /bypdd/mybucket4, which was never acquired.
> om_1| 2019-10-16 05:33:57,414 WARN ipc.Server: IPC Server handler 19 
> on 9862, call Call#4 Retry#8 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 
> 172.29.0.4:37018
> om_1| java.lang.IllegalMonitorStateException: Releasing lock on 
> resource /bypdd/mybucket4 without acquiring lock
> om_1| at 
> org.apache.hadoop.ozone.lock.LockManager.getLockForReleasing(LockManager.java:220)
> om_1| at 
> org.apache.hadoop.ozone.lock.LockManager.release(LockManager.java:168)
> om_1| at 
> org.apache.hadoop.ozone.lock.LockManager.writeUnlock(LockManager.java:148)
> om_1| at 
> org.apache.hadoop.ozone.om.lock.OzoneManagerLock.unlock(OzoneManagerLock.java:364)
> om_1| at 
> org.apache.hadoop.ozone.om.lock.OzoneManagerLock.releaseWriteLock(OzoneManagerLock.java:329)
> om_1| at 
> org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequest.validateAndUpdateCache(OMKeyCommitRequest.java:177)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2313) Duplicate release of lock in OMKeyCommitRequest

2019-10-15 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2313:
--

 Summary: Duplicate release of lock in OMKeyCommitRequest
 Key: HDDS-2313
 URL: https://issues.apache.org/jira/browse/HDDS-2313
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Manager
Affects Versions: 0.5.0
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


{noformat}
om_1| 2019-10-16 05:33:57,413 [IPC Server handler 19 on 9862] ERROR 
 - Trying to release the lock on /bypdd/mybucket4, which was never acquired.
om_1| 2019-10-16 05:33:57,414 WARN ipc.Server: IPC Server handler 19 on 
9862, call Call#4 Retry#8 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 
172.29.0.4:37018
om_1| java.lang.IllegalMonitorStateException: Releasing lock on 
resource /bypdd/mybucket4 without acquiring lock
om_1|   at 
org.apache.hadoop.ozone.lock.LockManager.getLockForReleasing(LockManager.java:220)
om_1|   at 
org.apache.hadoop.ozone.lock.LockManager.release(LockManager.java:168)
om_1|   at 
org.apache.hadoop.ozone.lock.LockManager.writeUnlock(LockManager.java:148)
om_1|   at 
org.apache.hadoop.ozone.om.lock.OzoneManagerLock.unlock(OzoneManagerLock.java:364)
om_1|   at 
org.apache.hadoop.ozone.om.lock.OzoneManagerLock.releaseWriteLock(OzoneManagerLock.java:329)
om_1|   at 
org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequest.validateAndUpdateCache(OMKeyCommitRequest.java:177)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952521#comment-16952521
 ] 

Hadoop QA commented on HDFS-14909:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 40s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14909 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983118/HDFS-14909.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c6d1fa03e209 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c39e9fc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28094/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28094/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Comment Edited] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952007#comment-16952007
 ] 

Surendra Singh Lilhore edited comment on HDFS-14909 at 10/16/19 4:57 AM:
-

Below code it decreasing count for {{excludedScope}}.
{code:java}
if (excludeRoot != null && root.isAncestor(excludeRoot)) {
  if (excludeRoot instanceof DFSTopologyNodeImpl) {
availableCount -= ((DFSTopologyNodeImpl)excludeRoot)
.getSubtreeStorageCount(type);
  } else {
availableCount -= ((DatanodeDescriptor)excludeRoot)
.hasStorageType(type) ? 1 : 0;
  }
} {code}
Again this code decreasing count for {{excludedNodes}}, but if excluded node is 
part of {{excludedScope}} then no need to decrease the count.
{code:java}
if (excludedNodes != null) {
  for (Node excludedNode : excludedNodes) {
if (excludedNode instanceof DatanodeDescriptor) {
  availableCount -= ((DatanodeDescriptor) excludedNode)
  .hasStorageType(type) ? 1 : 0;
} else if (excludedNode instanceof DFSTopologyNodeImpl) {
  availableCount -= ((DFSTopologyNodeImpl) excludedNode)
  .getSubtreeStorageCount(type);
} else if (excludedNode instanceof DatanodeInfo) {
 ...
  }
}{code}
Because of this {{availableCount}} is in negative value which is not expected
{code:java}
if (availableCount <= 0) {
  // should never be <0 in general, adding <0 check for safety purpose
  return null;
}{code}


was (Author: surendrasingh):
Below code it decreasing count for {{excludedScope}}.
{code:java}
if (excludeRoot != null && root.isAncestor(excludeRoot)) {
  if (excludeRoot instanceof DFSTopologyNodeImpl) {
availableCount -= ((DFSTopologyNodeImpl)excludeRoot)
.getSubtreeStorageCount(type);
  } else {
availableCount -= ((DatanodeDescriptor)excludeRoot)
.hasStorageType(type) ? 1 : 0;
  }
} {code}
Again this code decreasing count for {{excludedNodes}}, but is excluded node is 
part of {{excludedScope}} then no need to decrease the count.
{code:java}
if (excludedNodes != null) {
  for (Node excludedNode : excludedNodes) {
if (excludedNode instanceof DatanodeDescriptor) {
  availableCount -= ((DatanodeDescriptor) excludedNode)
  .hasStorageType(type) ? 1 : 0;
} else if (excludedNode instanceof DFSTopologyNodeImpl) {
  availableCount -= ((DFSTopologyNodeImpl) excludedNode)
  .getSubtreeStorageCount(type);
} else if (excludedNode instanceof DatanodeInfo) {
 ...
  }
}{code}
Because of this {{availableCount}} is in negative value which is not expected
{code:java}
if (availableCount <= 0) {
  // should never be <0 in general, adding <0 check for safety purpose
  return null;
}{code}

> DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage 
> count for excluded node which is already part of excluded scope 
> -
>
> Key: HDFS-14909
> URL: https://issues.apache.org/jira/browse/HDFS-14909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-14909.001.patch, HDFS-14909.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2312) Fix typo in ozone command

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2312:
-
Labels: pull-request-available  (was: )

> Fix typo in ozone command
> -
>
> Key: HDDS-2312
> URL: https://issues.apache.org/jira/browse/HDDS-2312
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone CLI
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Trivial
>  Labels: pull-request-available
>
> {noformat:title=ozone}
> Usage: ozone [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
> ...
> insight   tool to get runtime opeartion information
> ...
> {noformat}
> Should be "operation".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2312) Fix typo in ozone command

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2312?focusedWorklogId=328931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328931
 ]

ASF GitHub Bot logged work on HDDS-2312:


Author: ASF GitHub Bot
Created on: 16/Oct/19 04:51
Start Date: 16/Oct/19 04:51
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #34: HDDS-2312. 
Fix typo in ozone command
URL: https://github.com/apache/hadoop-ozone/pull/34
 
 
   ## What changes were proposed in this pull request?
   
   Trivial typo fix.
   
   https://issues.apache.org/jira/browse/HDDS-2312
   
   ## How was this patch tested?
   
   ```
   $ docker-compose exec scm ozone
   ...
   insight   tool to get runtime operation information
   ...
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328931)
Remaining Estimate: 0h
Time Spent: 10m

> Fix typo in ozone command
> -
>
> Key: HDDS-2312
> URL: https://issues.apache.org/jira/browse/HDDS-2312
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone CLI
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {noformat:title=ozone}
> Usage: ozone [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
> ...
> insight   tool to get runtime opeartion information
> ...
> {noformat}
> Should be "operation".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2312) Fix typo in ozone command

2019-10-15 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2312:
--

 Summary: Fix typo in ozone command
 Key: HDDS-2312
 URL: https://issues.apache.org/jira/browse/HDDS-2312
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone CLI
Affects Versions: 0.5.0
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


{noformat:title=ozone}
Usage: ozone [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
...
insight   tool to get runtime opeartion information
...
{noformat}

Should be "operation".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2312) Fix typo in ozone command

2019-10-15 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2312 started by Attila Doroszlai.
--
> Fix typo in ozone command
> -
>
> Key: HDDS-2312
> URL: https://issues.apache.org/jira/browse/HDDS-2312
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone CLI
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Trivial
>
> {noformat:title=ozone}
> Usage: ozone [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
> ...
> insight   tool to get runtime opeartion information
> ...
> {noformat}
> Should be "operation".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2283) Container Creation on datanodes take around 300ms due to rocksdb creation

2019-10-15 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-2283:
--
Attachment: HDDS-2283.00.patch

> Container Creation on datanodes take around 300ms due to rocksdb creation
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDDS-2283.00.patch
>
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2283) Container Creation on datanodes take around 300ms due to rocksdb creation

2019-10-15 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-2283:
--
Attachment: (was: HDDS-2283.00.patch)

> Container Creation on datanodes take around 300ms due to rocksdb creation
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDDS-2283.00.patch
>
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14894) Add balancer parameter to balance top used nodes

2019-10-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952499#comment-16952499
 ] 

Hadoop QA commented on HDFS-14894:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 58s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 9 new + 232 unchanged - 0 fixed = 241 total (was 232) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}102m  7s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
38s{color} | {color:red} The patch generated 11 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}179m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.TestReadStripedFileWithDNFailure |
|   | hadoop.hdfs.TestFileCorruption |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.TestParallelShortCircuitReadUnCached |
|   | hadoop.hdfs.TestDatanodeLayoutUpgrade |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14894 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983109/HDFS-14894.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c34bc810ef6d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c39e9fc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952469#comment-16952469
 ] 

Surendra Singh Lilhore commented on HDFS-14909:
---

Thanks [~elgoiri]  for review

Attached v2 patch and corrected condition.

> DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage 
> count for excluded node which is already part of excluded scope 
> -
>
> Key: HDFS-14909
> URL: https://issues.apache.org/jira/browse/HDFS-14909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-14909.001.patch, HDFS-14909.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-14909:
--
Attachment: HDFS-14909.002.patch

> DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage 
> count for excluded node which is already part of excluded scope 
> -
>
> Key: HDFS-14909
> URL: https://issues.apache.org/jira/browse/HDFS-14909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-14909.001.patch, HDFS-14909.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2309) Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches

2019-10-15 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952414#comment-16952414
 ] 

Bharat Viswanadham edited comment on HDDS-2309 at 10/16/19 1:25 AM:


Thank You [~rajesh.balamohan] for reporting this issue.

Few questions:
 # Is the workload only one client is used, or there are clients running in 
parallel. (Because if it is only client, in non-HA OM, until we flush to disk, 
we don't return the response to the client. So, if only one client is sending 
requests to OM, then it will be flushed for every one request. (Whereas in HA 
OM, we will not see this, as we return the response to the client after adding 
to cache, we don't wait for buffer flush). 

{quote}This forces {{cleanupCache}} to be invoked which ends up choking in 
single thread executor. Attaching the profiler information which gives more 
details.
{quote}
I think for non-HA we can skip scheduling cleanup cache for a few flush 
transaction iterations and also one more thing we can do is marking the future 
complete and then call cleanup cache. So, the client will not see the time 
taken for submitting to clean up the cache.


was (Author: bharatviswa):
Thank You [~rajesh.balamohan] for reporting this issue.

Few questions:
 # Is the workload only one client is used, or there are clients running in 
parallel. (Because if it is only client, in non-HA OM, until we flush to disk, 
we don't return the response to the client. So, if only one client is sending 
requests to OM, then it will be flushed for every one request. (Whereas in HA 
OM, we will not see this, as we return the response to the client after adding 
to cache, we don't wait for buffer flush). 

{quote}This forces {{cleanupCache}} to be invoked which ends up choking in 
single thread executor. Attaching the profiler information which gives more 
details.
{quote}
I think for non-HA we can skip scheduling cleanup cache immediately, as when 
run with singleThreadExecutor it will call cleanup cache immediately and also 
one more thing we can do is marking the future complete and then call cleanup 
cache. So, the client will not see the time taken for submitting to cleanup 
cache.

> Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches
> 
>
> Key: HDDS-2309
> URL: https://issues.apache.org/jira/browse/HDDS-2309
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: Screenshot 2019-10-15 at 4.19.13 PM.png
>
>
> When running a write heavy benchmark, 
> {{{color:#00}org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.flushTransactions{color}}}
>  was invoked for pretty much every write.
> This forces {{cleanupCache}} to be invoked which ends up choking in single 
> thread executor. Attaching the profiler information which gives more details.
> Ideally, {{flushTransactions}} should batch up the work to reduce load on 
> rocksDB.
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L130]
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L322]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2309) Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches

2019-10-15 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952414#comment-16952414
 ] 

Bharat Viswanadham edited comment on HDDS-2309 at 10/16/19 1:22 AM:


Thank You [~rajesh.balamohan] for reporting this issue.

Few questions:
 # Is the workload only one client is used, or there are clients running in 
parallel. (Because if it is only client, in non-HA OM, until we flush to disk, 
we don't return the response to the client. So, if only one client is sending 
requests to OM, then it will be flushed for every one request. (Whereas in HA 
OM, we will not see this, as we return the response to the client after adding 
to cache, we don't wait for buffer flush). 

{quote}This forces {{cleanupCache}} to be invoked which ends up choking in 
single thread executor. Attaching the profiler information which gives more 
details.
{quote}
I think for non-HA we can skip scheduling cleanup cache immediately, as when 
run with singleThreadExecutor it will call cleanup cache immediately and also 
one more thing we can do is marking the future complete and then call cleanup 
cache. So, the client will not see the time taken for submitting to cleanup 
cache.


was (Author: bharatviswa):
Thank You [~rajesh.balamohan] for reporting this issue.

Few questions:
 # Is the workload only one client is used, or there are clients running in 
parallel. (Because if it is only client, in non-HA OM, until we flush to disk, 
we don't return the response to the client. So, if only one client is sending 
requests to OM, then it will be flushed for every one request. (Whereas in HA 
OM, we will not see this, as we return the response to the client after adding 
to cache, we don't wait for buffer flush). 

{quote}This forces {{cleanupCache}} to be invoked which ends up choking in 
single thread executor. Attaching the profiler information which gives more 
details.
{quote}
This forces {{cleanupCache}} to be invoked which ends up choking in single 
thread executor. Attaching the profiler information which gives more details.

> Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches
> 
>
> Key: HDDS-2309
> URL: https://issues.apache.org/jira/browse/HDDS-2309
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: Screenshot 2019-10-15 at 4.19.13 PM.png
>
>
> When running a write heavy benchmark, 
> {{{color:#00}org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.flushTransactions{color}}}
>  was invoked for pretty much every write.
> This forces {{cleanupCache}} to be invoked which ends up choking in single 
> thread executor. Attaching the profiler information which gives more details.
> Ideally, {{flushTransactions}} should batch up the work to reduce load on 
> rocksDB.
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L130]
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L322]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2309) Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches

2019-10-15 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952414#comment-16952414
 ] 

Bharat Viswanadham edited comment on HDDS-2309 at 10/16/19 1:20 AM:


Thank You [~rajesh.balamohan] for reporting this issue.

Few questions:
 # Is the workload only one client is used, or there are clients running in 
parallel. (Because if it is only client, in non-HA OM, until we flush to disk, 
we don't return the response to the client. So, if only one client is sending 
requests to OM, then it will be flushed for every one request. (Whereas in HA 
OM, we will not see this, as we return the response to the client after adding 
to cache, we don't wait for buffer flush). 

{quote}This forces {{cleanupCache}} to be invoked which ends up choking in 
single thread executor. Attaching the profiler information which gives more 
details.
{quote}
This forces {{cleanupCache}} to be invoked which ends up choking in single 
thread executor. Attaching the profiler information which gives more details.


was (Author: bharatviswa):
Thank You [~rajesh.balamohan] for reporting this issue.

Few questions:
 # Is the workload only one client is used, or there are clients running in 
parallel. (Because if it is only client, in non-HA OM, until we flush to disk, 
we don't return the response to the client. So, if only one client is sending 
requests to OM, then it will be flushed for every one request. (Whereas in HA 
OM, we will not see this, as we return the response to the client after adding 
to cache, we don't wait for buffer flush). 

> Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches
> 
>
> Key: HDDS-2309
> URL: https://issues.apache.org/jira/browse/HDDS-2309
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: Screenshot 2019-10-15 at 4.19.13 PM.png
>
>
> When running a write heavy benchmark, 
> {{{color:#00}org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.flushTransactions{color}}}
>  was invoked for pretty much every write.
> This forces {{cleanupCache}} to be invoked which ends up choking in single 
> thread executor. Attaching the profiler information which gives more details.
> Ideally, {{flushTransactions}} should batch up the work to reduce load on 
> rocksDB.
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L130]
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L322]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2309) Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches

2019-10-15 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952414#comment-16952414
 ] 

Bharat Viswanadham commented on HDDS-2309:
--

Thank You [~rajesh.balamohan] for reporting this issue.

Few questions:
 # Is the workload only one client is used, or there are clients running in 
parallel. (Because if it is only client, in non-HA OM, until we flush to disk, 
we don't return the response to the client. So, if only one client is sending 
requests to OM, then it will be flushed for every one request. (Whereas in HA 
OM, we will not see this, as we return the response to the client after adding 
to cache, we don't wait for buffer flush). 

> Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches
> 
>
> Key: HDDS-2309
> URL: https://issues.apache.org/jira/browse/HDDS-2309
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: Screenshot 2019-10-15 at 4.19.13 PM.png
>
>
> When running a write heavy benchmark, 
> {{{color:#00}org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.flushTransactions{color}}}
>  was invoked for pretty much every write.
> This forces {{cleanupCache}} to be invoked which ends up choking in single 
> thread executor. Attaching the profiler information which gives more details.
> Ideally, {{flushTransactions}} should batch up the work to reduce load on 
> rocksDB.
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L130]
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L322]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14894) Add balancer parameter to balance top used nodes

2019-10-15 Thread Leon Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Gao updated HDFS-14894:

Attachment: HDFS-14894.001.patch
Status: Patch Available  (was: In Progress)

> Add balancer parameter to balance top used nodes
> 
>
> Key: HDFS-14894
> URL: https://issues.apache.org/jira/browse/HDFS-14894
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
> Attachments: HDFS-14894.001.patch
>
>
> We sometimes see a few of our datanodes reach very high usage (due to various 
> reasons) and we need to reduce their usage in an urgent situation.
> We see two ways to achieve it currently,
> -Calculate and reset balancing threshold.
> -Pick nodes manually according to usage stats and put them in a file and use 
> `-resource` flag.
> However, both of them are not very intuitive or too much manual work in an 
> urgent close-to-outage situation. Add a small feature to automatically pick 
> top used hosts will be a straightforward option, for example 
> `-sourceThreshold 95` to only target datanodes with >95% usage. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14762) "Path(Path/String parent, String child)" will fail when "child" contains ":"

2019-10-15 Thread Shixiong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952404#comment-16952404
 ] 

Shixiong Zhu commented on HDFS-14762:
-

[~ayushtkn] Some file systems allow ":" in the name. For example, in the local 
file system, the following Scala codes will fail because it cannot create a 
path for the checksum file ".a:b.crc" on Linux. But "/tmp/a:b" will be created.

 
{code:java}
import org.apache.hadoop.fs._
import org.apache.hadoop.conf._
val conf = new Configuration
val path = new Path("file:///tmp/a:b")
val fs = path.getFileSystem(conf)
fs.create(path).close()
{code}
 

The same issue happens in S3AFileSystem.

IMO, since both FileSystem and Path are generic classes, if ":" should not 
appear in a valid file name, it should be checked in abstract FileSystem class 
rather than HDFS. If FileSystem doesn't check it and it's not enforced in all 
FileSystems, it should be a valid char in a file name, and should be fixed in 
Path.

> "Path(Path/String parent, String child)" will fail when "child" contains ":"
> 
>
> Key: HDFS-14762
> URL: https://issues.apache.org/jira/browse/HDFS-14762
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Priority: Major
> Attachments: HDFS-14762.001.patch, HDFS-14762.002.patch, 
> HDFS-14762.003.patch, HDFS-14762.004.patch
>
>
> When the "child" parameter contains ":", "Path(Path/String parent, String 
> child)" will throw the following exception:
> {code}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ...
> {code}
> Not sure if this is a legit bug. But the following places will hit this error 
> when seeing a Path with a file name containing ":":
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java#L101
> https://github.com/apache/hadoop/blob/f9029c4070e8eb046b403f5cb6d0a132c5d58448/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L270



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1985) Fix listVolumes API

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-1985:
-
Target Version/s: 0.5.0

> Fix listVolumes API
> ---
>
> Key: HDDS-1985
> URL: https://issues.apache.org/jira/browse/HDDS-1985
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> This Jira is to fix lisVolumes API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listVolumes, it should use both 
> in-memory cache and rocksdb volume table to list volumes for a user.
>  
> No fix is required for this, as the information is retrieved from the MPU Key 
> table, this information is not retrieved through RocksDB Table iteration. (As 
> when we use get() this checks from cache first, and then it checks table)
>  
> Used this Jira to add an integration test to verify the behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1985) Fix listVolumes API

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-1985:
-
Status: Patch Available  (was: Open)

> Fix listVolumes API
> ---
>
> Key: HDDS-1985
> URL: https://issues.apache.org/jira/browse/HDDS-1985
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> This Jira is to fix lisVolumes API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listVolumes, it should use both 
> in-memory cache and rocksdb volume table to list volumes for a user.
>  
> No fix is required for this, as the information is retrieved from the MPU Key 
> table, this information is not retrieved through RocksDB Table iteration. (As 
> when we use get() this checks from cache first, and then it checks table)
>  
> Used this Jira to add an integration test to verify the behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2311) Fix logic of RetryPolicy in OzoneClientSideTranslatorPB

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-2311:
-
Summary: Fix logic of RetryPolicy in OzoneClientSideTranslatorPB  (was: Fix 
logic in RetryPolicy in OzoneClientSideTranslatorPB)

> Fix logic of RetryPolicy in OzoneClientSideTranslatorPB
> ---
>
> Key: HDDS-2311
> URL: https://issues.apache.org/jira/browse/HDDS-2311
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Priority: Blocker
>
> OzoneManagerProtocolClientSideTranslatorPB.java
> L251: if (cause instanceof NotLeaderException) {
>  NotLeaderException notLeaderException = (NotLeaderException) cause;
>  omFailoverProxyProvider.performFailoverIfRequired(
>  notLeaderException.getSuggestedLeaderNodeId());
>  return getRetryAction(RetryAction.RETRY, retries, failovers);
>  }
>  
> The suggested leader returned from Server is not used during failOver, as the 
> cause is a type of RemoteException. So with current code, it does not use 
> suggested leader for failOver at all and by default with each OM, it tries 
> max retries.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2311) Fix logic in RetryPolicy in OzoneClientSideTranslatorPB

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-2311:
-
Parent: HDDS-505
Issue Type: Sub-task  (was: Task)

> Fix logic in RetryPolicy in OzoneClientSideTranslatorPB
> ---
>
> Key: HDDS-2311
> URL: https://issues.apache.org/jira/browse/HDDS-2311
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Priority: Blocker
>
> OzoneManagerProtocolClientSideTranslatorPB.java
> L251: if (cause instanceof NotLeaderException) {
>  NotLeaderException notLeaderException = (NotLeaderException) cause;
>  omFailoverProxyProvider.performFailoverIfRequired(
>  notLeaderException.getSuggestedLeaderNodeId());
>  return getRetryAction(RetryAction.RETRY, retries, failovers);
>  }
>  
> The suggested leader returned from Server is not used during failOver, as the 
> cause is a type of RemoteException. So with current code, it does not use 
> suggested leader for failOver at all and by default with each OM, it tries 
> max retries.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2311) Fix logic in RetryPolicy in OzoneClientSideTranslatorPB

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-2311:
-
Target Version/s: 0.5.0

> Fix logic in RetryPolicy in OzoneClientSideTranslatorPB
> ---
>
> Key: HDDS-2311
> URL: https://issues.apache.org/jira/browse/HDDS-2311
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Priority: Blocker
>
> OzoneManagerProtocolClientSideTranslatorPB.java
> L251: if (cause instanceof NotLeaderException) {
>  NotLeaderException notLeaderException = (NotLeaderException) cause;
>  omFailoverProxyProvider.performFailoverIfRequired(
>  notLeaderException.getSuggestedLeaderNodeId());
>  return getRetryAction(RetryAction.RETRY, retries, failovers);
>  }
>  
> The suggested leader returned from Server is not used during failOver, as the 
> cause is a type of RemoteException. So with current code, it does not use 
> suggested leader for failOver at all and by default with each OM, it tries 
> max retries.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2311) Fix logic in RetryPolicy in OzoneClientSideTranslatorPB

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-2311:
-
Priority: Blocker  (was: Major)

> Fix logic in RetryPolicy in OzoneClientSideTranslatorPB
> ---
>
> Key: HDDS-2311
> URL: https://issues.apache.org/jira/browse/HDDS-2311
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Priority: Blocker
>
> OzoneManagerProtocolClientSideTranslatorPB.java
> L251: if (cause instanceof NotLeaderException) {
>  NotLeaderException notLeaderException = (NotLeaderException) cause;
>  omFailoverProxyProvider.performFailoverIfRequired(
>  notLeaderException.getSuggestedLeaderNodeId());
>  return getRetryAction(RetryAction.RETRY, retries, failovers);
>  }
>  
> The suggested leader returned from Server is not used during failOver, as the 
> cause is a type of RemoteException. So with current code, it does not use 
> suggested leader for failOver at all and by default with each OM, it tries 
> max retries.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2311) Fix logic in RetryPolicy in OzoneClientSideTranslatorPB

2019-10-15 Thread Bharat Viswanadham (Jira)
Bharat Viswanadham created HDDS-2311:


 Summary: Fix logic in RetryPolicy in OzoneClientSideTranslatorPB
 Key: HDDS-2311
 URL: https://issues.apache.org/jira/browse/HDDS-2311
 Project: Hadoop Distributed Data Store
  Issue Type: Task
Reporter: Bharat Viswanadham


OzoneManagerProtocolClientSideTranslatorPB.java

L251: if (cause instanceof NotLeaderException) {
 NotLeaderException notLeaderException = (NotLeaderException) cause;
 omFailoverProxyProvider.performFailoverIfRequired(
 notLeaderException.getSuggestedLeaderNodeId());
 return getRetryAction(RetryAction.RETRY, retries, failovers);
 }

 

The suggested leader returned from Server is not used during failOver, as the 
cause is a type of RemoteException. So with current code, it does not use 
suggested leader for failOver at all and by default with each OM, it tries max 
retries.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2283) Container Creation on datanodes take around 300ms due to rocksdb creation

2019-10-15 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952327#comment-16952327
 ] 

Siddharth Wagle commented on HDDS-2283:
---

I will create a PR with code cleanup and UT completed.

> Container Creation on datanodes take around 300ms due to rocksdb creation
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDDS-2283.00.patch
>
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2283) Container Creation on datanodes take around 300ms due to rocksdb creation

2019-10-15 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-2283:
--
Attachment: HDDS-2283.00.patch

> Container Creation on datanodes take around 300ms due to rocksdb creation
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDDS-2283.00.patch
>
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2283) Container Creation on datanodes take around 300ms due to rocksdb creation

2019-10-15 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952325#comment-16952325
 ] 

Siddharth Wagle commented on HDDS-2283:
---

[~msingh] Attaching a speculative patch can you let me know your thoughts?

With the patch:
{code}
2019-10-15 14:35:40,290 INFO  volume.HddsVolume (HddsVolume.java:(176)) - 
Creating Volume: 
/var/folders/7y/d3vtnjg502sgppd08pj0j0j4gp/T/junit6254259191707885578/hdds 
of  storage type : null and capacity : 500068036608
2019-10-15 14:35:40,949 INFO  utils.MetadataStoreBuilder 
(MetadataStoreBuilder.java:build(149)) - Time before create: 84
2019-10-15 14:35:40,950 INFO  utils.RocksDBStore (RocksDBStore.java:(68)) 
- Time to load library: 0
2019-10-15 14:35:40,958 INFO  utils.RocksDBStore (RocksDBStore.java:(75)) 
- Time to open: 7
2019-10-15 14:35:40,958 INFO  helpers.KeyValueContainerUtil 
(KeyValueContainerUtil.java:createContainerMetaData(85)) - Total time to 
create: 98
2019-10-15 14:35:41,013 WARN  util.NativeCodeLoader 
(NativeCodeLoader.java:(60)) - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable
2019-10-15 14:35:41,015 INFO  utils.MetadataStoreBuilder 
(MetadataStoreBuilder.java:build(149)) - Time before create: 0
2019-10-15 14:35:41,015 INFO  utils.RocksDBStore (RocksDBStore.java:(68)) 
- Time to load library: 0
2019-10-15 14:35:41,021 INFO  utils.RocksDBStore (RocksDBStore.java:(75)) 
- Time to open: 6
2019-10-15 14:35:41,021 INFO  helpers.KeyValueContainerUtil 
(KeyValueContainerUtil.java:createContainerMetaData(85)) - Total time to 
create: 7
{code}

> Container Creation on datanodes take around 300ms due to rocksdb creation
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952315#comment-16952315
 ] 

Hadoop QA commented on HDFS-14909:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 37s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 7 new + 0 unchanged - 0 fixed = 7 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m  7s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
33s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.net.TestDFSNetworkTopology |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.server.namenode.TestPersistentStoragePolicySatisfier |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14909 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983088/HDFS-14909.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2102db3b5573 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 85af77c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28092/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952295#comment-16952295
 ] 

Íñigo Goiri commented on HDFS-14909:


Isn't this always true?
{code}
216 && excludeRoot.getNetworkLocation().startsWith(
217 excludeRoot.getNetworkLocation())) {
{code}

> DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage 
> count for excluded node which is already part of excluded scope 
> -
>
> Key: HDFS-14909
> URL: https://issues.apache.org/jira/browse/HDFS-14909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-14909.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14887) RBF: In Router Web UI, Observer Namenode Information displaying as Unavailable

2019-10-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952292#comment-16952292
 ] 

Íñigo Goiri commented on HDFS-14887:


My bad... 
{code}
assertTrue("Cannot find ns0 in map: " + map, map.containsKey("ns0"));
{code}
Looks very bad as the map toString is not very clean, let's do:
{code}
assertTrue("Cannot find ns0 in: " + jsonString , map.containsKey("ns0"));
{code}

> RBF: In Router Web UI, Observer Namenode Information displaying as Unavailable
> --
>
> Key: HDFS-14887
> URL: https://issues.apache.org/jira/browse/HDFS-14887
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: 14887.after.png, 14887.before.png, HDFS-14887.001.patch, 
> HDFS-14887.002.patch, HDFS-14887.003.patch, HDFS-14887.004.patch, 
> HDFS-14887.005.patch, HDFS-14887.006.patch, HDFS-14887.007.patch
>
>
> In Router Web UI, Observer Namenode Information displaying as Unavailable.
> We should show a proper icon for them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14284) RBF: Log Router identifier when reporting exceptions

2019-10-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952289#comment-16952289
 ] 

Íñigo Goiri commented on HDFS-14284:


Interesting, there are some exceptions that get unwrapped.
I'm guessing this is because of the RouterIOException.

> RBF: Log Router identifier when reporting exceptions
> 
>
> Key: HDFS-14284
> URL: https://issues.apache.org/jira/browse/HDFS-14284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14284.001.patch, HDFS-14284.002.patch, 
> HDFS-14284.003.patch, HDFS-14284.004.patch, HDFS-14284.005.patch, 
> HDFS-14284.006.patch
>
>
> The typical setup is to use multiple Routers through 
> ConfiguredFailoverProxyProvider.
> In a regular HA Namenode setup, it is easy to know which NN was used.
> However, in RBF, any Router can be the one reporting the exception and it is 
> hard to know which was the one.
> We should have a way to identify which Router/Namenode was the one triggering 
> the exception.
> This would also apply with Observer Namenodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13081) Datanode#checkSecureConfig should allow SASL and privileged HTTP

2019-10-15 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952282#comment-16952282
 ] 

Chen Liang commented on HDFS-13081:
---

Hey folks, any plan to backport to branch-2? I do try do the backport if no 
objection/concerns.

> Datanode#checkSecureConfig should allow SASL and privileged HTTP
> 
>
> Key: HDFS-13081
> URL: https://issues.apache.org/jira/browse/HDFS-13081
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, security
>Affects Versions: 3.0.0
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 3.1.0, 3.0.3
>
> Attachments: HDFS-13081.000.patch, HDFS-13081.001.patch, 
> HDFS-13081.002.patch, HDFS-13081.003.patch, HDFS-13081.004.patch, 
> HDFS-13081.005.patch, HDFS-13081.006.patch
>
>
> Datanode#checkSecureConfig currently check the following to determine if 
> secure datanode is enabled. 
>  # The server has bound to privileged ports for RPC and HTTP via 
> SecureDataNodeStarter.
>  # The configuration enables SASL on DataTransferProtocol and HTTPS (no plain 
> HTTP) for the HTTP server. 
> Authentication of Datanode RPC server can be done either via SASL handshake 
> or JSVC/privilege RPC port. 
> This guarantees authentication of the datanode RPC server before a client 
> transmits a secret, such as a block access token. 
> Authentication of the  HTTP server can also be done either via HTTPS/SSL or 
> JSVC/privilege HTTP port. This guarantees authentication of datandoe HTTP 
> server before a client transmits a secret, such as a delegation token.
> This ticket is open to allow privileged HTTP as an alternative to HTTPS to 
> work with SASL based RPC protection.
>  
> cc: [~cnauroth] , [~daryn], [~jnpandey] for additional feedback.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2310) Add support to add ozone ranger plugin to Ozone Manager classpath

2019-10-15 Thread Vivek Ratnavel Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2310 started by Vivek Ratnavel Subramanian.

> Add support to add ozone ranger plugin to Ozone Manager classpath
> -
>
> Key: HDDS-2310
> URL: https://issues.apache.org/jira/browse/HDDS-2310
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Manager
>Affects Versions: 0.5.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>
> Currently, there is no way to add Ozone Ranger plugin to Ozone Manager 
> classpath. 
> We should be able to set an environment variable that will be respected by 
> ozone and added to Ozone Manager classpath.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2310) Add support to add ozone ranger plugin to Ozone Manager classpath

2019-10-15 Thread Vivek Ratnavel Subramanian (Jira)
Vivek Ratnavel Subramanian created HDDS-2310:


 Summary: Add support to add ozone ranger plugin to Ozone Manager 
classpath
 Key: HDDS-2310
 URL: https://issues.apache.org/jira/browse/HDDS-2310
 Project: Hadoop Distributed Data Store
  Issue Type: Task
  Components: Ozone Manager
Affects Versions: 0.5.0
Reporter: Vivek Ratnavel Subramanian
Assignee: Vivek Ratnavel Subramanian


Currently, there is no way to add Ozone Ranger plugin to Ozone Manager 
classpath. 

We should be able to set an environment variable that will be respected by 
ozone and added to Ozone Manager classpath.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2254) Fix flaky unit testTestContainerStateMachine#testRatisSnapshotRetention

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2254?focusedWorklogId=328775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328775
 ]

ASF GitHub Bot logged work on HDDS-2254:


Author: ASF GitHub Bot
Created on: 15/Oct/19 20:23
Start Date: 15/Oct/19 20:23
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #31: HDDS-2254 : 
Fix flaky unit test TestContainerStateMachine#testRatisSn…
URL: https://github.com/apache/hadoop-ozone/pull/31
 
 
   …apshotRetention.
   
   ## What changes were proposed in this pull request?
   On locally trying out repeated runs of the unit test, the unit test failed 
intermittently while asserting "Null" value for CSM snapshot. This assertion is 
not valid when the other unit test in the class executes before and creates 
keys in the cluster/container. Hence, moved to a model where each unit test 
creates its own cluster.
   
   https://issues.apache.org/jira/browse/HDDS-2254
   
   ## How was this patch tested?
   Ran the unit tests in the IDE and command line.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328775)
Time Spent: 1h 20m  (was: 1h 10m)

> Fix flaky unit testTestContainerStateMachine#testRatisSnapshotRetention
> ---
>
> Key: HDDS-2254
> URL: https://issues.apache.org/jira/browse/HDDS-2254
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Test always fails with assertion error:
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachine.testRatisSnapshotRetention(TestContainerStateMachine.java:188)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2283) Container Creation on datanodes take around 300ms due to rocksdb creation

2019-10-15 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952262#comment-16952262
 ] 

Siddharth Wagle commented on HDDS-2283:
---

[~msingh] In my unit test, the obvious bottleneck is time to create the RocksDB 
options object which is doing a native call:
{code}
2019-10-15 13:20:10,714 INFO  utils.MetadataStoreBuilder 
(MetadataStoreBuilder.java:build(124)) - Time before create, load options: 81
2019-10-15 13:20:10,715 INFO  utils.RocksDBStore (RocksDBStore.java:(68)) 
- Time to load library: 0
2019-10-15 13:20:10,723 INFO  utils.RocksDBStore (RocksDBStore.java:(75)) 
- Time to open: 8
2019-10-15 13:20:10,723 INFO  helpers.KeyValueContainerUtil 
(KeyValueContainerUtil.java:createContainerMetaData(85)) - Total time to 
create: {}95
{code}

> Container Creation on datanodes take around 300ms due to rocksdb creation
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2283) Container Creation on datanodes take around 300ms due to rocksdb creation

2019-10-15 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle reassigned HDDS-2283:
-

Assignee: Siddharth Wagle

> Container Creation on datanodes take around 300ms due to rocksdb creation
> -
>
> Key: HDDS-2283
> URL: https://issues.apache.org/jira/browse/HDDS-2283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>
> Container Creation on datanodes take around 300ms due to rocksdb creation. 
> Rocksdb creation is taking a considerable time and this needs to be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2254) Fix flaky unit testTestContainerStateMachine#testRatisSnapshotRetention

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2254?focusedWorklogId=328774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328774
 ]

ASF GitHub Bot logged work on HDDS-2254:


Author: ASF GitHub Bot
Created on: 15/Oct/19 20:20
Start Date: 15/Oct/19 20:20
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on pull request #1604: HDDS-2254. 
Fix flaky unit test TestContainerStateMachine#testRatisSnapshotRetention
URL: https://github.com/apache/hadoop/pull/1604
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328774)
Time Spent: 1h 10m  (was: 1h)

> Fix flaky unit testTestContainerStateMachine#testRatisSnapshotRetention
> ---
>
> Key: HDDS-2254
> URL: https://issues.apache.org/jira/browse/HDDS-2254
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Test always fails with assertion error:
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachine.testRatisSnapshotRetention(TestContainerStateMachine.java:188)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952229#comment-16952229
 ] 

Stephen O'Donnell commented on HDFS-14854:
--

For LowRedundancyBlocks, any changes are certainly a separate Jira. This one is 
already large enough. I am also wary of a major refactor in a critical area, 
while what is there works quite well generally.

For your further comments.
 # I will move all the locks outside of the try blocks.
 # I will clean this part up.
 # I am going to keep what is there. The pattern was established this way in 
the default monitor, so both of them work in roughly the same way. The current 
approach also lends some flexibility to throttling the number of nodes which 
have their storage scanned in a pass of the check loop in a similar way to the 
default monitor, if we later decided that is needed.
 # I will add the Java Doc.
 # In general this loop will find blocks it needs to move to the pending list 
and if there are no nodes decommissioning this code will never be called. On 
balance I feel the extra information we get from the taking the lock is worth 
it. Additionally in the BackOffMonitor the locking is much less aggressive than 
what it replaces so we should be good there.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch, HDFS-14854.009.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1988) Fix listParts API

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-1988:
-
Description: 
This Jira is to fix listParts API in HA code path.

In HA, we have an in-memory cache, where we put the result to in-memory cache 
and return the response, later it will be picked by double buffer thread and it 
will flush to disk. So, now when do listParts of a MPU key, it should use both 
in-memory cache and rocksdb mpu table to list parts of a mpu key.

 

No fix is required for this, as the information is retrieved from the MPU Key 
table, this information is not retrieved through RocksDB Table iteration. (As 
when we use get() this checks from cache first, and then it checks table)

 

Used this Jira to add an integration test to verify the behavior.

  was:
This Jira is to fix listParts API in HA code path.

In HA, we have an in-memory cache, where we put the result to in-memory cache 
and return the response, later it will be picked by double buffer thread and it 
will flush to disk. So, now when do listParts of a MPU key, it should use both 
in-memory cache and rocksdb mpu table to list parts of a mpu key.

 

No fix is required for this, as the information is retrieved from the MPU Key 
table, this information is not retrieved through RocksDB Table iteration. (As 
when we use get() this checks from cache first, and then it checks table)


> Fix listParts API
> -
>
> Key: HDDS-1988
> URL: https://issues.apache.org/jira/browse/HDDS-1988
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira is to fix listParts API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listParts of a MPU key, it should use 
> both in-memory cache and rocksdb mpu table to list parts of a mpu key.
>  
> No fix is required for this, as the information is retrieved from the MPU Key 
> table, this information is not retrieved through RocksDB Table iteration. (As 
> when we use get() this checks from cache first, and then it checks table)
>  
> Used this Jira to add an integration test to verify the behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1985) Fix listVolumes API

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-1985:
-
Description: 
This Jira is to fix lisVolumes API in HA code path.

In HA, we have an in-memory cache, where we put the result to in-memory cache 
and return the response, later it will be picked by double buffer thread and it 
will flush to disk. So, now when do listVolumes, it should use both in-memory 
cache and rocksdb volume table to list volumes for a user.

 

No fix is required for this, as the information is retrieved from the MPU Key 
table, this information is not retrieved through RocksDB Table iteration. (As 
when we use get() this checks from cache first, and then it checks table)

 

Used this Jira to add an integration test to verify the behavior.

  was:
This Jira is to fix lisVolumes API in HA code path.

In HA, we have an in-memory cache, where we put the result to in-memory cache 
and return the response, later it will be picked by double buffer thread and it 
will flush to disk. So, now when do listVolumes, it should use both in-memory 
cache and rocksdb volume table to list volumes for a user.


> Fix listVolumes API
> ---
>
> Key: HDDS-1985
> URL: https://issues.apache.org/jira/browse/HDDS-1985
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> This Jira is to fix lisVolumes API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listVolumes, it should use both 
> in-memory cache and rocksdb volume table to list volumes for a user.
>  
> No fix is required for this, as the information is retrieved from the MPU Key 
> table, this information is not retrieved through RocksDB Table iteration. (As 
> when we use get() this checks from cache first, and then it checks table)
>  
> Used this Jira to add an integration test to verify the behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1988) Fix listParts API

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-1988:
-
Target Version/s: 0.5.0

> Fix listParts API
> -
>
> Key: HDDS-1988
> URL: https://issues.apache.org/jira/browse/HDDS-1988
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira is to fix listParts API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listParts of a MPU key, it should use 
> both in-memory cache and rocksdb mpu table to list parts of a mpu key.
>  
> No fix is required for this, as the information is retrieved from the MPU Key 
> table, this information is not retrieved through RocksDB Table iteration. (As 
> when we use get() this checks from cache first, and then it checks table)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1988) Fix listParts API

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-1988:
-
Description: 
This Jira is to fix listParts API in HA code path.

In HA, we have an in-memory cache, where we put the result to in-memory cache 
and return the response, later it will be picked by double buffer thread and it 
will flush to disk. So, now when do listParts of a MPU key, it should use both 
in-memory cache and rocksdb mpu table to list parts of a mpu key.

 

No fix is required for this, as the information is retrieved from the MPU Key 
table, this information is not retrieved through RocksDB Table iteration. (As 
when we use get() this checks from cache first, and then it checks table)

  was:
This Jira is to fix listParts API in HA code path.

In HA, we have an in-memory cache, where we put the result to in-memory cache 
and return the response, later it will be picked by double buffer thread and it 
will flush to disk. So, now when do listParts of a MPU key, it should use both 
in-memory cache and rocksdb mpu table to list parts of a mpu key.


> Fix listParts API
> -
>
> Key: HDDS-1988
> URL: https://issues.apache.org/jira/browse/HDDS-1988
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira is to fix listParts API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listParts of a MPU key, it should use 
> both in-memory cache and rocksdb mpu table to list parts of a mpu key.
>  
> No fix is required for this, as the information is retrieved from the MPU Key 
> table, this information is not retrieved through RocksDB Table iteration. (As 
> when we use get() this checks from cache first, and then it checks table)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1988) Fix listParts API

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-1988:
-
Status: Patch Available  (was: Open)

> Fix listParts API
> -
>
> Key: HDDS-1988
> URL: https://issues.apache.org/jira/browse/HDDS-1988
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira is to fix listParts API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listParts of a MPU key, it should use 
> both in-memory cache and rocksdb mpu table to list parts of a mpu key.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1988) Fix listParts API

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1988?focusedWorklogId=328752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328752
 ]

ASF GitHub Bot logged work on HDDS-1988:


Author: ASF GitHub Bot
Created on: 15/Oct/19 19:00
Start Date: 15/Oct/19 19:00
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #30: 
HDDS-1988. Fix listParts API.
URL: https://github.com/apache/hadoop-ozone/pull/30
 
 
   https://issues.apache.org/jira/browse/HDDS-1988
   
   We don't need any fix for List Parts API, as the information of all uploaded 
parts is stored as key-value in MPU Table(To retrieve this information, we are 
not iterating the RocksDB table). Added Integration test for this scenario, to 
verify it is working properly or not.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328752)
Remaining Estimate: 0h
Time Spent: 10m

> Fix listParts API
> -
>
> Key: HDDS-1988
> URL: https://issues.apache.org/jira/browse/HDDS-1988
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira is to fix listParts API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listParts of a MPU key, it should use 
> both in-memory cache and rocksdb mpu table to list parts of a mpu key.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1988) Fix listParts API

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-1988:
-
Labels: pull-request-available  (was: )

> Fix listParts API
> -
>
> Key: HDDS-1988
> URL: https://issues.apache.org/jira/browse/HDDS-1988
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>
> This Jira is to fix listParts API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response, later it will be picked by double buffer thread and 
> it will flush to disk. So, now when do listParts of a MPU key, it should use 
> both in-memory cache and rocksdb mpu table to list parts of a mpu key.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14890) Setting permissions on name directory fails on non posix compliant filesystems

2019-10-15 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952209#comment-16952209
 ] 

Siddharth Wagle edited comment on HDFS-14890 at 10/15/19 6:45 PM:
--

[~mohansella] Could you please file a Jira with the exception/log for 
permission setting on Windows env? I am marking this Jira as resolved since the 
NN failure is not an issue with the patch.


was (Author: swagle):
[~mohansella] Could you please file a Jira with the exception/log for 
permission setting on Windows env. I am marking this Jira as resolved since the 
NN failure is not an issue with the patch.

> Setting permissions on name directory fails on non posix compliant filesystems
> --
>
> Key: HDFS-14890
> URL: https://issues.apache.org/jira/browse/HDFS-14890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: Windows 10.
>Reporter: hirik
>Assignee: Siddharth Wagle
>Priority: Blocker
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14890.01.patch
>
>
> Hi,
> HDFS NameNode and JournalNode are not starting in Windows machine. Found 
> below related exception in logs. 
> Caused by: java.lang.UnsupportedOperationExceptionCaused by: 
> java.lang.UnsupportedOperationException
> at java.base/java.nio.file.Files.setPosixFilePermissions(Files.java:2155)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:422)
> at 
> com.slog.dfs.hdfs.nn.NameNodeServiceImpl.delayedStart(NameNodeServiceImpl.java:147)
>  
> Code changes related to this issue: 
> [https://github.com/apache/hadoop/commit/07e3cf952eac9e47e7bd5e195b0f9fc28c468313#diff-1a56e69d50f21b059637cfcbf1d23f11]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14890) Setting permissions on name directory fails on non posix compliant filesystems

2019-10-15 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952209#comment-16952209
 ] 

Siddharth Wagle edited comment on HDFS-14890 at 10/15/19 6:44 PM:
--

[~mohansella] Could you please file a Jira with the exception/log for 
permission setting on Windows env. I am marking this Jira as resolved since the 
NN failure is not an issue with the patch.


was (Author: swagle):
[~mohansella] Could you please file a Jira with the exception/log for 
permission setting on Windows env. I am marking this Jira as resolved since the 
NN failure if not an issue with the patch.

> Setting permissions on name directory fails on non posix compliant filesystems
> --
>
> Key: HDFS-14890
> URL: https://issues.apache.org/jira/browse/HDFS-14890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: Windows 10.
>Reporter: hirik
>Assignee: Siddharth Wagle
>Priority: Blocker
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14890.01.patch
>
>
> Hi,
> HDFS NameNode and JournalNode are not starting in Windows machine. Found 
> below related exception in logs. 
> Caused by: java.lang.UnsupportedOperationExceptionCaused by: 
> java.lang.UnsupportedOperationException
> at java.base/java.nio.file.Files.setPosixFilePermissions(Files.java:2155)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:422)
> at 
> com.slog.dfs.hdfs.nn.NameNodeServiceImpl.delayedStart(NameNodeServiceImpl.java:147)
>  
> Code changes related to this issue: 
> [https://github.com/apache/hadoop/commit/07e3cf952eac9e47e7bd5e195b0f9fc28c468313#diff-1a56e69d50f21b059637cfcbf1d23f11]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14890) Setting permissions on name directory fails on non posix compliant filesystems

2019-10-15 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle resolved HDFS-14890.

Resolution: Fixed

[~mohansella] Could you please file a Jira with the exception/log for 
permission setting on Windows env. I am marking this Jira as resolved since the 
NN failure if not an issue with the patch.

> Setting permissions on name directory fails on non posix compliant filesystems
> --
>
> Key: HDFS-14890
> URL: https://issues.apache.org/jira/browse/HDFS-14890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: Windows 10.
>Reporter: hirik
>Assignee: Siddharth Wagle
>Priority: Blocker
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14890.01.patch
>
>
> Hi,
> HDFS NameNode and JournalNode are not starting in Windows machine. Found 
> below related exception in logs. 
> Caused by: java.lang.UnsupportedOperationExceptionCaused by: 
> java.lang.UnsupportedOperationException
> at java.base/java.nio.file.Files.setPosixFilePermissions(Files.java:2155)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:422)
> at 
> com.slog.dfs.hdfs.nn.NameNodeServiceImpl.delayedStart(NameNodeServiceImpl.java:147)
>  
> Code changes related to this issue: 
> [https://github.com/apache/hadoop/commit/07e3cf952eac9e47e7bd5e195b0f9fc28c468313#diff-1a56e69d50f21b059637cfcbf1d23f11]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-14909:
--
Attachment: (was: HDFS-14909.001.patch)

> DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage 
> count for excluded node which is already part of excluded scope 
> -
>
> Key: HDFS-14909
> URL: https://issues.apache.org/jira/browse/HDFS-14909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-14909.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-14909:
--
Status: Patch Available  (was: Open)

> DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage 
> count for excluded node which is already part of excluded scope 
> -
>
> Key: HDFS-14909
> URL: https://issues.apache.org/jira/browse/HDFS-14909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-14909.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-14909:
--
Attachment: HDFS-14909.001.patch

> DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage 
> count for excluded node which is already part of excluded scope 
> -
>
> Key: HDFS-14909
> URL: https://issues.apache.org/jira/browse/HDFS-14909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-14909.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-14909:
--
Attachment: HDFS-14909.001.patch

> DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage 
> count for excluded node which is already part of excluded scope 
> -
>
> Key: HDFS-14909
> URL: https://issues.apache.org/jira/browse/HDFS-14909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-14909.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2295) Display log of freon on the standard output

2019-10-15 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao resolved HDDS-2295.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

Thanks [~elek] for the contribution and all for the reviews. I've merged the 
changes.

> Display log of freon on the standard output
> ---
>
> Key: HDDS-2295
> URL: https://issues.apache.org/jira/browse/HDDS-2295
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HDDS-2042 disabled the console logging for all of the ozone command line 
> tools including freon.
> But freon is different, it has a different error handling model. For freon we 
> need all the log on the console.
>  1. To follow all the different errors
>  2. To get information about the used (random) prefix which can be reused 
> during the validation phase.
>  
> I propose to restore the original behavior for Ozone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2295) Display log of freon on the standard output

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2295?focusedWorklogId=328739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328739
 ]

ASF GitHub Bot logged work on HDDS-2295:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:38
Start Date: 15/Oct/19 18:38
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #14: HDDS-2295. 
Display log of freon on the standard output
URL: https://github.com/apache/hadoop-ozone/pull/14
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328739)
Time Spent: 20m  (was: 10m)

> Display log of freon on the standard output
> ---
>
> Key: HDDS-2295
> URL: https://issues.apache.org/jira/browse/HDDS-2295
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HDDS-2042 disabled the console logging for all of the ozone command line 
> tools including freon.
> But freon is different, it has a different error handling model. For freon we 
> need all the log on the console.
>  1. To follow all the different errors
>  2. To get information about the used (random) prefix which can be reused 
> during the validation phase.
>  
> I propose to restore the original behavior for Ozone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13879) FileSystem: Add allowSnapshot, disallowSnapshot, getSnapshotDiffReport and getSnapshottableDirListing

2019-10-15 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-13879:


Assignee: (was: hemanthboyina)

> FileSystem: Add allowSnapshot, disallowSnapshot, getSnapshotDiffReport and 
> getSnapshottableDirListing
> -
>
> Key: HDFS-13879
> URL: https://issues.apache.org/jira/browse/HDFS-13879
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.1
>Reporter: Siyao Meng
>Priority: Major
>
> I wonder whether we should add allowSnapshot() and disallowSnapshot() to 
> FileSystem abstract class.
> I think we should because createSnapshot(), renameSnapshot() and 
> deleteSnapshot() are already part of it.
> Any reason why we don't want to do this?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2181) Ozone Manager should send correct ACL type in ACL requests to Authorizer

2019-10-15 Thread Xiaoyu Yao (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952186#comment-16952186
 ] 

Xiaoyu Yao commented on HDDS-2181:
--

Thanks [~vivekratnavel] for the contribution and all for the reviews and 
discussions. I've commit the change to trunk. 

> Ozone Manager should send correct ACL type in ACL requests to Authorizer
> 
>
> Key: HDDS-2181
> URL: https://issues.apache.org/jira/browse/HDDS-2181
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Currently, Ozone manager sends "WRITE" as ACLType for key create, key delete 
> and bucket create operation. Fix the acl type in all requests to the 
> authorizer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2181) Ozone Manager should send correct ACL type in ACL requests to Authorizer

2019-10-15 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2181:
-
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Ozone Manager should send correct ACL type in ACL requests to Authorizer
> 
>
> Key: HDDS-2181
> URL: https://issues.apache.org/jira/browse/HDDS-2181
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Currently, Ozone manager sends "WRITE" as ACLType for key create, key delete 
> and bucket create operation. Fix the acl type in all requests to the 
> authorizer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952183#comment-16952183
 ] 

David Mollitor commented on HDFS-14854:
---

What I was saying before,... now that I've dug into it a bit more, is that we 
should look at revamping the 
{{org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks}} class as 
part of this effort.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch, HDFS-14854.009.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952179#comment-16952179
 ] 

David Mollitor commented on HDFS-14854:
---

# https://stackoverflow.com/questions/10868423/lock-lock-before-try
# Please grab the lock for {{dn.getStorageInfos()}} in its own block.  Easier 
to reason about.
# Using a 'null' value in this way is overloading the use of the {{Map}} class 
and it's not clearly articulated in the comments how this works.  I think it 
would be much cleaner to have {{processPendingNodes()}} return a list of nodes 
that need to be processed instead of populating the {{Map}} in this way.

{code:java}
  List pendingNodes;
 try {
...
processCancelledNodes();
pendingNodes = processPendingNodes();
  } finally {
namesystem.writeUnlock();
  }
 ...
 check(pendingNodes);
{code}

4. 

bq. For nodes to be added to pendingNodes, that is always done under the 
namenode writeLock

Please put that as a requirement in the JavaDoc for {{startTrackingNode}} 
method.

5.  I worry about the needless locking because that lock is a very hot lock,... 
used all over the place, and the time per iteration is configurable, so 30 
seconds is the default, but user may opt to lower to 1 second and there's no 
information for them to know that this will increase the lock retention, even 
if there is nothing to replicate.


> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch, HDFS-14854.009.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14284) RBF: Log Router identifier when reporting exceptions

2019-10-15 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952173#comment-16952173
 ] 

hemanthboyina commented on HDFS-14284:
--

even if we use FileSystem Interface we get the exception as 
{code:java}
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException):
 org.apache.hadoop.hdfs.server.federation.router.NoNamenodesAvailableException: 
No namenodes available under nameservice ns0 from router ** {code}
the class for the exception is RemoteException .

> RBF: Log Router identifier when reporting exceptions
> 
>
> Key: HDFS-14284
> URL: https://issues.apache.org/jira/browse/HDFS-14284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14284.001.patch, HDFS-14284.002.patch, 
> HDFS-14284.003.patch, HDFS-14284.004.patch, HDFS-14284.005.patch, 
> HDFS-14284.006.patch
>
>
> The typical setup is to use multiple Routers through 
> ConfiguredFailoverProxyProvider.
> In a regular HA Namenode setup, it is easy to know which NN was used.
> However, in RBF, any Router can be the one reporting the exception and it is 
> hard to know which was the one.
> We should have a way to identify which Router/Namenode was the one triggering 
> the exception.
> This would also apply with Observer Namenodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952169#comment-16952169
 ] 

Hadoop QA commented on HDFS-14854:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
0s{color} | {color:red} Docker failed to build yetus/hadoop:104ccca9169. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14854 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983085/HDFS-14854.009.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28091/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch, HDFS-14854.009.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-14854:
-
Attachment: HDFS-14854.009.patch

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch, HDFS-14854.009.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952163#comment-16952163
 ] 

Stephen O'Donnell commented on HDFS-14854:
--

[~elgoiri] Thanks for the comments. I added maxConcurrentTrackedNodes to the 
shared code, but left numBlocksChecked unshared. Both classes use this variable 
in a slightly different way, so I did not want to confuse things.

 

[~belugabehr] There probably is merit to looking at the replication queue 
implementation, but that is an area that can affect many parts of the namenode 
code, so we need to tread carefully with making too many changes. I feel we 
should get this decommission code into shape and testing in real world clusters 
and then we can revisit the queue and see what can be done.

 

 

[~elgoiri]

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952156#comment-16952156
 ] 

Stephen O'Donnell commented on HDFS-14854:
--

[~belugabehr] Thanks for the comments. I have addressed some in the 009 patch 
and a few I have not addressed. Please see below which gives the summary:

{quote}
Best practice is to grab the lock outside of the try statement.
{quote}

Maybe there is something I don't understand and I see it does state the best 
practice you mentioned in the link, but I don't see how:
{code:java}
 lock
try {
   ...
} finally {
unlock
}{code}
Is better than:
{code:java}
 try {
   lock()
   ...
} finally {
   unlock()
}{code}

{quote}
The cancelledNodes data structure is a List but it should be a Queue
{quote}

True, I have changed it to an ArrayDeque like with pendingNodes.

{quote}
Using 'null' values is very out of vogue. Better to put a new HashMap here. 
Allows for simplification of the code by assuming that values will never be 
'null'.
{quote}

I use the fact that the value is null to decide whether the datanode needs an 
initial scan or not. The flow is:

1. Take a nodes from pendingNodes and add to outOfServiceNodeBlocks with a null 
value.

2. Later, in the check() method, for each null entry in outofServiceNodeBlocks 
scan the node and add a hashmap of the blocks needing processed.

{quote}
The method scanDatanodeStorage uses the namesystem.readLock(); in a pretty 
verbose and complicated way.
{quote}

We need to hold the read lock when calling dn.getStorageInfos() in the outer 
loop, but we want to drop the lock after processing each storage and then take 
it again. I could refactor this to call dn.getStorageInfos() in a block, eg:
{code:java}
DatanodeStorageInfo[] storages;
namesystem.readLock()
try {
 storages = dn.getStorageInfo();
} finally {
 namesystem.readUnlock();
} {code}
And then simplify the locking code which is there, but I feel this isn't much 
better than what is there.


{quote}
This method is accessed by the local running Thread. However, pendingNodes does 
not appear to be a thread-safe Collection. Perhaps the collection cannot be 
modified because of the external locking of the writeLock but there is no 
requirement to have the lock stated in the startTrackingNode method javadoc.
{quote}

For nodes to be added to pendingNodes, that is always done under the namenode 
writeLock which is taken in fsnamesystem. Then the only place pendingNodes is 
processed in BackOffMonitor in via the monitor thread, and it ensures it holds 
the write lock when processing pendingNodes here:
{code:java}
@Override
 public void run() {
 LOG.debug("DatanodeAdminMonitorV2 is running.");
 if (!namesystem.isRunning()) {
 LOG.info("Namesystem is not running, skipping " +
 "decommissioning/maintenance checks.");
 return;
 }
 // Reset the checked count at beginning of each iteration
 numBlocksChecked = 0;
 // Check decommission or maintenance progress.
 try {
 try {
 /**
 * Other threads can modify the pendingNode list and the cancelled
 * node list, so we must process them under the NN write lock to
 * prevent any concurrent modifications.
 */
 namesystem.writeLock();
 // Always process the cancelled list before the pending list, as
 // it is possible for a node to be cancelled, and then quickly added
 // back again. If we process these the other way around, the added
 // node will be removed from tracking by the pending cancel.
 processCancelledNodes();
 processPendingNodes();
 } finally {
 namesystem.writeUnlock();
 } {code}

{quote}
Nit: this is not very java-y...
{quote}

I think we could argue this one either way. I think my version looks cleaner, 
but I admit modifying the passed in structure can be slightly confusing.


{quote}
Please remove this method. It can be replaced with map.computeIfAbsent(key, k 
-> new LinkedList()).add(v);
{quote}

Thanks for pointing this out. I did not know about that method. I have changed 
this to use computeIfAbsent.

{quote}
This code knows the pendingCount value and the pendingRepLimit... do not grab 
the write lock if the function is going to immediately return anyway.
{quote}

The problem is, we need the lock to check 
blockManager.getLowRedundancyBlocksCount() and if the pendingCount is not 
reducing, then I would really like to log the replication queue size, as it may 
be due to an overloaded replication queue that the pendingCount is not 
reducing. In an earlier version I did have the check outside the lock but then 
I wanted to added the rep queue size to the log and moved it back in. This code 
is only run once per 30 seconds and the lock would only be held a tiny amount 
of time, so I think the additional details in the log are worth the lock price.


{quote}
I think it should return 'true' if the block is orphaned, no? It should skip 
them in the same way that an 'unknown' block is.
{quote}

That code is take from the original 

[jira] [Commented] (HDFS-14739) RBF: LS command for mount point shows wrong owner and permission information.

2019-10-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952147#comment-16952147
 ] 

Íñigo Goiri commented on HDFS-14739:


+1 on  [^HDFS-14739-trunk-011.patch].

> RBF: LS command for mount point shows wrong owner and permission information.
> -
>
> Key: HDFS-14739
> URL: https://issues.apache.org/jira/browse/HDFS-14739
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-14739-trunk-001.patch, HDFS-14739-trunk-002.patch, 
> HDFS-14739-trunk-003.patch, HDFS-14739-trunk-004.patch, 
> HDFS-14739-trunk-005.patch, HDFS-14739-trunk-006.patch, 
> HDFS-14739-trunk-007.patch, HDFS-14739-trunk-008.patch, 
> HDFS-14739-trunk-009.patch, HDFS-14739-trunk-010.patch, 
> HDFS-14739-trunk-011.patch, image-2019-08-16-17-15-50-614.png, 
> image-2019-08-16-17-16-00-863.png, image-2019-08-16-17-16-34-325.png
>
>
> ||source||target namespace||destination||owner||group||permission||
> |/mnt|ns0|/mnt|mnt|mnt_group|755|
> |/mnt/test1|ns1|/mnt/test1|mnt_test1|mnt_test1_group|755|
> |/test1|ns1|/test1|test1|test1_group|755|
> When do getListing("/mnt"), the owner of  */mnt/test1* should be *mnt_test1* 
> instead of *test1* in result.
>  
> And if the mount table as blew, we should support getListing("/mnt") instead 
> of throw IOException when dfs.federation.router.default.nameservice.enable is 
> false.
> ||source||target namespace||destination||owner||group||permission||
> |/mnt/test1|ns0|/mnt/test1|test1|test1|755|
> |/mnt/test2|ns1|/mnt/test2|test2|test2|755|
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14887) RBF: In Router Web UI, Observer Namenode Information displaying as Unavailable

2019-10-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952142#comment-16952142
 ] 

Hadoop QA commented on HDFS-14887:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
1s{color} | {color:red} Docker failed to build yetus/hadoop:104ccca9169. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14887 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983084/HDFS-14887.007.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28090/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: In Router Web UI, Observer Namenode Information displaying as Unavailable
> --
>
> Key: HDFS-14887
> URL: https://issues.apache.org/jira/browse/HDFS-14887
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: 14887.after.png, 14887.before.png, HDFS-14887.001.patch, 
> HDFS-14887.002.patch, HDFS-14887.003.patch, HDFS-14887.004.patch, 
> HDFS-14887.005.patch, HDFS-14887.006.patch, HDFS-14887.007.patch
>
>
> In Router Web UI, Observer Namenode Information displaying as Unavailable.
> We should show a proper icon for them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-10-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952141#comment-16952141
 ] 

Íñigo Goiri commented on HDFS-14908:


Thanks [~LiJinglun] for the patch.
{{DFSUtil#isParent()}} is a pretty common pattern so it is good to have it 
there.
I like the initial structure, but I would also split the last return with the 
OR into two if statements and return false at the end.
The compiler will end up doing the same but in this way it is a little more 
readable and we can add comments for each if.
BTW, another option would be to rely on {{Path}} which has a lot of this 
already.

> LeaseManager should check parent-child relationship when filter open files.
> ---
>
> Key: HDFS-14908
> URL: https://issues.apache.org/jira/browse/HDFS-14908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14908.001.patch
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14887) RBF: In Router Web UI, Observer Namenode Information displaying as Unavailable

2019-10-15 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14887:
-
Attachment: HDFS-14887.007.patch

> RBF: In Router Web UI, Observer Namenode Information displaying as Unavailable
> --
>
> Key: HDFS-14887
> URL: https://issues.apache.org/jira/browse/HDFS-14887
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: 14887.after.png, 14887.before.png, HDFS-14887.001.patch, 
> HDFS-14887.002.patch, HDFS-14887.003.patch, HDFS-14887.004.patch, 
> HDFS-14887.005.patch, HDFS-14887.006.patch, HDFS-14887.007.patch
>
>
> In Router Web UI, Observer Namenode Information displaying as Unavailable.
> We should show a proper icon for them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14890) Setting permissions on name directory fails on non posix compliant filesystems

2019-10-15 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952139#comment-16952139
 ] 

Siddharth Wagle commented on HDFS-14890:


Hi [~mohansella], the changes in HDFS-2470 do not affect datanode data dir. 
Also, the clearDirectory() in the patch is called when NN is formatted, are you 
testing this on a freshly deployed cluster or explicitly calling format?

If the set permission is failing can you provide some log details?

[~hirik] This Jira does address the NN failure, I am not sure if there is an 
issue yet with setting permissions, waiting on some logs/exception.

> Setting permissions on name directory fails on non posix compliant filesystems
> --
>
> Key: HDFS-14890
> URL: https://issues.apache.org/jira/browse/HDFS-14890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: Windows 10.
>Reporter: hirik
>Assignee: Siddharth Wagle
>Priority: Blocker
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14890.01.patch
>
>
> Hi,
> HDFS NameNode and JournalNode are not starting in Windows machine. Found 
> below related exception in logs. 
> Caused by: java.lang.UnsupportedOperationExceptionCaused by: 
> java.lang.UnsupportedOperationException
> at java.base/java.nio.file.Files.setPosixFilePermissions(Files.java:2155)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:422)
> at 
> com.slog.dfs.hdfs.nn.NameNodeServiceImpl.delayedStart(NameNodeServiceImpl.java:147)
>  
> Code changes related to this issue: 
> [https://github.com/apache/hadoop/commit/07e3cf952eac9e47e7bd5e195b0f9fc28c468313#diff-1a56e69d50f21b059637cfcbf1d23f11]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12491) Support wildcard in CLASSPATH for libhdfs

2019-10-15 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-12491:
-
Target Version/s: 3.3.0, 2.10.1  (was: 2.10.0, 3.3.0)

> Support wildcard in CLASSPATH for libhdfs
> -
>
> Key: HDFS-12491
> URL: https://issues.apache.org/jira/browse/HDFS-12491
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Affects Versions: 2.8.0
>Reporter: John Zhuge
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: HDFS-12491.001.patch, HDFS-12491.002.patch, 
> testWildCard.sh
>
>
> According to libhdfs doc, wildcard in CLASSPATH is not support:
> bq. The most common problem is the CLASSPATH is not set properly when calling 
> a program that uses libhdfs. Make sure you set it to all the Hadoop jars 
> needed to run Hadoop itself as well as the right configuration directory 
> containing hdfs-site.xml. It is not valid to use wildcard syntax for 
> specifying multiple jars. It may be useful to run hadoop classpath --glob or 
> hadoop classpath --jar  to generate the correct classpath for your 
> deployment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12548) HDFS Jenkins build is unstable on branch-2

2019-10-15 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-12548:
-
Target Version/s: 2.10.1  (was: 2.10.0)

> HDFS Jenkins build is unstable on branch-2
> --
>
> Key: HDFS-12548
> URL: https://issues.apache.org/jira/browse/HDFS-12548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.9.0
>Reporter: Rushabh Shah
>Priority: Critical
>
> Feel free move the ticket to another project (e.g. infra).
> Recently I attached branch-2 patch while working on one jira 
> [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> There were at-least 100 failed and timed out tests. I am sure they are not 
> related to my patch.
> Also I came across another jira which was just a javadoc related change and 
> there were around 100 failed tests.
> Below are the details for pre-commits that failed in branch-2
> 1 [HDFS-12386 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069]
> {noformat}
> Ran on slave: asf912.gq1.ygridcore.net/H12
> Failed with following error message:
> Build timed out (after 300 minutes). Marking the build as aborted.
> Build was aborted
> Performing Post build task...
> {noformat}
> 2. [HDFS-12386 attempt 
> 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> {noformat}
> Ran on slave: asf900.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
> Caused: java.io.IOException: Backing channel 'H0' is disconnected.
>   at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
>   at 
> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
>   at com.sun.proxy.$Proxy125.isAlive(Unknown Source)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
>   at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
>   at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
>   at hudson.model.Build$BuildExecution.build(Build.java:206)
>   at hudson.model.Build$BuildExecution.doRun(Build.java:163)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490)
>   at hudson.model.Run.execute(Run.java:1735)
>   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>   at hudson.model.ResourceController.execute(ResourceController.java:97)
>   at hudson.model.Executor.run(Executor.java:405)
> {noformat}
> 3. [HDFS-12531 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493]
> {noformat}
> Ran on slave:  asf911.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> 

[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-10-15 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14305:
-
Target Version/s: 3.3.0, 3.1.4, 3.2.2, 2.10.1  (was: 2.10.0, 3.3.0, 3.1.4, 
3.2.2)

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14277) [SBN read] Observer benchmark results

2019-10-15 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14277:
-
Target Version/s: 2.10.1  (was: 2.10.0)

> [SBN read] Observer benchmark results
> -
>
> Key: HDFS-14277
> URL: https://issues.apache.org/jira/browse/HDFS-14277
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: ha, namenode
>Affects Versions: 2.10.0, 3.3.0
> Environment: Hardware: 4-node cluster, each node has 4 core, Xeon 
> 2.5Ghz, 25GB memory.
> Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, 
> RPC encryption + Data Transfer Encryption, Cloudera Navigator.
>Reporter: Wei-Chiu Chuang
>Priority: Blocker
> Attachments: Observer profiler.png, Screen Shot 2019-02-14 at 
> 11.50.37 AM.png, observer RPC queue processing time.png
>
>
> Ran a few benchmarks and profiler (VisualVM) today on an Observer-enabled 
> cluster. Would like to share the results with the community. The cluster has 
> 1 Observer node.
> h2. NNThroughputBenchmark
> Generate 1 million files and send fileStatus RPCs.
> {code:java}
> hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
>   -op fileStatus -threads 100 -files 100 -useExisting 
> -keepResults
> {code}
> h3. Kerberos, SSL, RPC encryption, Data Transfer Encryption enabled:
> ||Node||fileStatus (Ops per sec)||
> |Active NameNode|4865|
> |Observer|3996|
> h3. Kerberos, SSL:
> ||Node||fileStatus (Ops per sec)||
> |Active NameNode|7078|
> |Observer|6459|
> Observation:
>  * due to the edit tailing overhead, Observer node consume 30% CPU 
> utilization even if the cluster is idle.
>  * While Active NN has less than 1ms RPC processing time, Observer node has > 
> 5ms RPC processing time. I am still looking for the source of the longer 
> processing time. The longer RPC processing time may be the cause for the 
> performance degradation compared to that of Active NN. Note the cluster has 
> Cloudera Navigator installed which adds additional overhead to RPC processing 
> time.
>  * {{GlobalStateIdContext#isCoordinatedCall()}} pops up as one of the top 
> hotspots in the profiler. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13678) StorageType is incompatible when rolling upgrade to 2.6/2.6+ versions

2019-10-15 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-13678:
-
Target Version/s: 2.9.3, 2.10.1  (was: 2.10.0, 2.9.3)

> StorageType is incompatible when rolling upgrade to 2.6/2.6+ versions
> -
>
> Key: HDFS-13678
> URL: https://issues.apache.org/jira/browse/HDFS-13678
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.5.0
>Reporter: Yiqun Lin
>Priority: Major
>
> In version 2.6.0, we supported more storage types in HDFS that implemented in 
> HDFS-6584. But this seems a incompatible change when we rolling upgrade our 
> cluster from 2.5.0 to 2.6.0 and throw following error.
> {noformat}
> 2018-06-14 11:43:39,246 ERROR [DataNode: 
> [[[DISK]file:/home/vipshop/hard_disk/dfs/, [DISK]file:/data1/dfs/, 
> [DISK]file:/data2/dfs/]] heartbeating to xx.xx.xx.xx:8022] 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService 
> for Block pool BP-670256553-xx.xx.xx.xx-1528795419404 (Datanode Uuid 
> ab150e05-fcb7-49ed-b8ba-f05c27593fee) service to xx.xx.xx.xx:8022
> java.lang.ArrayStoreException
>  at java.util.ArrayList.toArray(ArrayList.java:412)
>  at 
> java.util.Collections$UnmodifiableCollection.toArray(Collections.java:1034)
>  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1030)
>  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:836)
>  at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:146)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:566)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:664)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:835)
>  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The scenery is that old DN parses StorageType error that got from new NN. 
> This error is taking place in sending heratbeat to NN and blocks won't be 
> reported to NN successfully. This will lead subsequent errors.
> Corresponding logic in 2.5.0:
> {code}
>   public static BlockCommand convert(BlockCommandProto blkCmd) {
> ...
> StorageType[][] targetStorageTypes = new StorageType[targetList.size()][];
> List targetStorageTypesList = 
> blkCmd.getTargetStorageTypesList();
> if (targetStorageTypesList.isEmpty()) { // missing storage types
>   for(int i = 0; i < targetStorageTypes.length; i++) {
> targetStorageTypes[i] = new StorageType[targets[i].length];
> Arrays.fill(targetStorageTypes[i], StorageType.DEFAULT);
>   }
> } else {
>   for(int i = 0; i < targetStorageTypes.length; i++) {
> List p = 
> targetStorageTypesList.get(i).getStorageTypesList();
> targetStorageTypes[i] = p.toArray(new StorageType[p.size()]);  < 
> error here
>   }
> }
> {code}
> But corresponding to the current logic , it's will be better to return 
> default type instead of a exception in case StorageType changed(new fields 
> added or new types) in new versions during rolling upgrade.
> {code:java}
> public static StorageType convertStorageType(StorageTypeProto type) {
> switch(type) {
> case DISK:
>   return StorageType.DISK;
> case SSD:
>   return StorageType.SSD;
> case ARCHIVE:
>   return StorageType.ARCHIVE;
> case RAM_DISK:
>   return StorageType.RAM_DISK;
> case PROVIDED:
>   return StorageType.PROVIDED;
> default:
>   throw new IllegalStateException(
>   "BUG: StorageTypeProto not found, type=" + type);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14794) [SBN read] reportBadBlock is rejected by Observer.

2019-10-15 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14794:
-
Target Version/s: 2.10.1  (was: 2.10.0)

> [SBN read] reportBadBlock is rejected by Observer.
> --
>
> Key: HDFS-14794
> URL: https://issues.apache.org/jira/browse/HDFS-14794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> {{reportBadBlock}} is rejected by Observer via StandbyException
> {code}StandbyException: Operation category WRITE is not supported in state 
> observer{code}
> We should investigate what are the consequences of this and if we should 
> treat {{reportBadBlock}} as IBRs. Note that {{reportBadBlock}} is a part of 
> both {{ClientProtocol}} and {{DatanodeProtocol}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14792) [SBN read] StanbyNode does not come out of safemode while adding new blocks.

2019-10-15 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14792:
-
Target Version/s: 2.10.1  (was: 2.10.0)

> [SBN read] StanbyNode does not come out of safemode while adding new blocks.
> 
>
> Key: HDFS-14792
> URL: https://issues.apache.org/jira/browse/HDFS-14792
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> During startup StandbyNode reports that it needs additional X blocks to reach 
> the threshold 1.. Where X is changing up and down.
> This is because with fast tailing SBN adds new blocks from edits while DNs 
> have not reported replicas yet. Being in SafeMode SBN counts new blocks 
> towards the threshold and can stay in SafeMode for a long time.
> By design, the purpose of startup SafeMode is to disallow modifications of 
> the namespace and blocks map until all DN replicas are reported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14667) Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2

2019-10-15 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14667:
-
Target Version/s: 2.10.1  (was: 2.10.0)

> Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2
> 
>
> Key: HDFS-14667
> URL: https://issues.apache.org/jira/browse/HDFS-14667
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14403-branch-2.000.patch
>
>
> We would like to target pulling HDFS-14403, an important operability 
> enhancement, into branch-2.
> It's only present in trunk now so we also need to backport through the 3.x 
> lines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14503) ThrottledAsyncChecker throws NPE during block pool initialization

2019-10-15 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14503:
-
Target Version/s: 3.3.0, 2.10.1  (was: 2.10.0, 3.3.0)

> ThrottledAsyncChecker throws NPE during block pool initialization 
> --
>
> Key: HDFS-14503
> URL: https://issues.apache.org/jira/browse/HDFS-14503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Yiqun Lin
>Priority: Major
>
> ThrottledAsyncChecker throws NPE during block pool initialization. The error 
> leads the block pool registration failure.
> The exception
> {noformat}
> 2019-05-20 01:02:36,003 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Unexpected exception in block pool Block pool  (Datanode Uuid 
> x) service to xx.xx.xx.xx/xx.xx.xx.xx
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$LastCheckResult.access$000(ThrottledAsyncChecker.java:211)
> at 
> org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker.schedule(ThrottledAsyncChecker.java:129)
> at 
> org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker.checkAllVolumes(DatasetVolumeChecker.java:209)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3387)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1508)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looks like this error due to {{WeakHashMap}} type map {{completedChecks}} has 
> removed the target entry while we still get that entry. Although we have done 
> a check before we get it, there is still a chance the entry is got as null. 
> We met a corner case for this: A federation mode, two block pools in DN, 
> {{ThrottledAsyncChecker}} schedules two same health checks for same volume.
> {noformat}
> 2019-05-20 01:02:36,000 INFO 
> org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: 
> Scheduling a check for /hadoop/2/hdfs/data/current
> 2019-05-20 01:02:36,000 INFO 
> org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: 
> Scheduling a check for /hadoop/2/hdfs/data/current
> {noformat}
> {{completedChecks}} cleans up the entry for one successful check after called 
> {{completedChecks#get}}. However, after this, another check we get the null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-10-15 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952094#comment-16952094
 ] 

Wei-Chiu Chuang commented on HDFS-14908:


[~linyiqun] wanna take a look?

> LeaseManager should check parent-child relationship when filter open files.
> ---
>
> Key: HDFS-14908
> URL: https://issues.apache.org/jira/browse/HDFS-14908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14908.001.patch
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-10-15 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14908:
---
Affects Version/s: 3.1.0
   3.0.1

> LeaseManager should check parent-child relationship when filter open files.
> ---
>
> Key: HDFS-14908
> URL: https://issues.apache.org/jira/browse/HDFS-14908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14908.001.patch
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-10-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952088#comment-16952088
 ] 

Hadoop QA commented on HDFS-14908:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 47s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 324 unchanged - 0 fixed = 325 total (was 324) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 29s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}146m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14908 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12983063/HDFS-14908.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 64dd1c8dd0d1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 336abbd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28089/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28089/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28089/testReport/ |
| Max. process+thread count | 4242 (vs. ulimit of 

[jira] [Work logged] (HDDS-2196) Add CLI Commands and Protobuf messages to trigger decom states

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2196?focusedWorklogId=328658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328658
 ]

ASF GitHub Bot logged work on HDDS-2196:


Author: ASF GitHub Bot
Created on: 15/Oct/19 16:11
Start Date: 15/Oct/19 16:11
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #20: HDDS-2196 
Add CLI Commands and Protobuf messages to trigger decom states 
URL: https://github.com/apache/hadoop-ozone/pull/20
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328658)
Time Spent: 1.5h  (was: 1h 20m)

> Add CLI Commands and Protobuf messages to trigger decom states
> --
>
> Key: HDDS-2196
> URL: https://issues.apache.org/jira/browse/HDDS-2196
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM, SCM Client
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> To all nodes to be decommissioned, recommissioned and put into maintenance, 
> we need a few commands.
> These will be added to the existing "scm cli". 3 commands are proposed:
> Decommission:
> ozone scmcli dnadmin decommission hosta hostb hostc:port ...
> Put nodes into maintenance:
> osone scmcli dnadmin maintenance hosta hostb hostc:port ... <-endHours>
> Take nodes out of maintenance or halt decommission:
> ozone scmcli dnadmin recommission hosta hostb hostc:port
> These 3 commands will call 3 new protobuf messages and they will be part of 
> the "StorageContainerLocationProtocol":
>  * DecommissionNodesRequestProto
>  * RecommissionNodesRequestProto
>  * StartMaintenanceNodesRequestProto
> In additional a new class NodeDecommissionManager will be introduced that 
> will receive these commands and carry out the decommission steps.
> In this patch NodeDecommissionManager is only a skeleton implementation to 
> receive the commands as this patch is mainly focused on getting the CLI 
> commands and protobuf messages in place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2309) Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches

2019-10-15 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham reassigned HDDS-2309:


Assignee: Bharat Viswanadham

> Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches
> 
>
> Key: HDDS-2309
> URL: https://issues.apache.org/jira/browse/HDDS-2309
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: Screenshot 2019-10-15 at 4.19.13 PM.png
>
>
> When running a write heavy benchmark, 
> {{{color:#00}org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.flushTransactions{color}}}
>  was invoked for pretty much every write.
> This forces {{cleanupCache}} to be invoked which ends up choking in single 
> thread executor. Attaching the profiler information which gives more details.
> Ideally, {{flushTransactions}} should batch up the work to reduce load on 
> rocksDB.
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L130]
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L322]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952069#comment-16952069
 ] 

David Mollitor commented on HDFS-14854:
---

[~sodonnell] [~elgoiri]  I provided some feedback for you to review regarding 
this specific patching.

However, I would like to draw your attention to something I was saying before...

I think it would be cool if we could also include the 
{{BlockManager#neededReconstruction}} in improving decommissioning.  There is a 
bunch of polling going on in this class, checking sizes and statuses.  I think 
some of that could be removed by making the 
{{BlockManager#neededReconstruction}} Collection a synchronized priority 
queue perhaps it should just be it's own priority queue-backed 
{{ExecutorService}}.  This will help in that requests from dead nodes will be 
prioritized ahead of requests for decommissioning.  You could probably also 
make it a {{BlockingQueue}} with a fixed-size so that threads block if the 
queue gets too large.  In this way, there doesn't need to be batching.  Just 
figure out the next block to replicate, give up the global lock, try to add it 
to the {{neededReconstruction}} queue, and once complete, go find the next 
block to replicate.  Something like that.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14907) [Dynamometer] DataNode can't find junit jar when using Hadoop-3 binary

2019-10-15 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952065#comment-16952065
 ] 

Erik Krogen commented on HDFS-14907:


Thanks for reporting this [~tasanuma]. We partially fixed this issue in 
HDFS-14717, but I think it only addressed the issue on the client side. It 
seems that the DataNode side still needs help.

To make it more resilient to potential future changes in JUnit, can we at least 
search for something matching {{junit-*.jar}} instead of a specific version?

> [Dynamometer] DataNode can't find junit jar when using Hadoop-3 binary
> --
>
> Key: HDFS-14907
> URL: https://issues.apache.org/jira/browse/HDFS-14907
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Takanobu Asanuma
>Priority: Major
>
> When executing {{start-dynamometer-cluster.sh}} with Hadoop-3 binary, 
> datanodes fail to run with the following log and 
> {{start-dynamometer-cluster.sh}} fails.
> {noformat}
> LogType:stderr
> LogLastModifiedTime:Wed Oct 09 15:03:09 +0900 2019
> LogLength:1386
> LogContents:
> Exception in thread "main" java.lang.NoClassDefFoundError: org/junit/Assert
> at 
> org.apache.hadoop.test.GenericTestUtils.assertExists(GenericTestUtils.java:299)
> at 
> org.apache.hadoop.test.GenericTestUtils.getTestDir(GenericTestUtils.java:243)
> at 
> org.apache.hadoop.test.GenericTestUtils.getTestDir(GenericTestUtils.java:252)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.getBaseDirectory(MiniDFSCluster.java:2982)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.determineDfsBaseDir(MiniDFSCluster.java:2972)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.formatDataNodeDirs(MiniDFSCluster.java:2834)
> at 
> org.apache.hadoop.tools.dynamometer.SimulatedDataNodes.run(SimulatedDataNodes.java:123)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.tools.dynamometer.SimulatedDataNodes.main(SimulatedDataNodes.java:88)
> Caused by: java.lang.ClassNotFoundException: org.junit.Assert
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 9 more
> ./start-component.sh: line 317: kill: (2261) - No such process
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952040#comment-16952040
 ] 

David Mollitor commented on HDFS-14854:
---

{code:java}
if (blockManager.blocksMap.getStoredBlock(block) == null) {
  LOG.trace("Removing unknown block {}", block);
  return true;
}

long bcId = block.getBlockCollectionId();
if (bcId == INodeId.INVALID_INODE_ID) {
  // Orphan block, will be invalidated eventually. Skip.
  return false;
}
{code}

I think it should return 'true' if the block is orphaned, no?  It should skip 
them in the same way that an 'unknown' block is.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2309) Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches

2019-10-15 Thread Mukul Kumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952037#comment-16952037
 ] 

Mukul Kumar Singh commented on HDDS-2309:
-

cc: [~arp][~bharat][~hanishakoneru]

> Optimise OzoneManagerDoubleBuffer::flushTransactions to flush in batches
> 
>
> Key: HDDS-2309
> URL: https://issues.apache.org/jira/browse/HDDS-2309
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: Screenshot 2019-10-15 at 4.19.13 PM.png
>
>
> When running a write heavy benchmark, 
> {{{color:#00}org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.flushTransactions{color}}}
>  was invoked for pretty much every write.
> This forces {{cleanupCache}} to be invoked which ends up choking in single 
> thread executor. Attaching the profiler information which gives more details.
> Ideally, {{flushTransactions}} should batch up the work to reduce load on 
> rocksDB.
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L130]
>  
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java#L322]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952019#comment-16952019
 ] 

David Mollitor commented on HDFS-14854:
---

This code knows the pendingCount value and the pendingRepLimit... do not grab 
the write lock if the function is going to immediately return anyway.

{code:java}
int pendingCount = getPendingCount();

try {
  namesystem.writeLock();
  long repQueueSize = blockManager.getLowRedundancyBlocksCount();
...
  if (pendingCount >= pendingRepLimit) {
return;
  }
{code}

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952014#comment-16952014
 ] 

David Mollitor commented on HDFS-14854:
---

Please remove this method.  It can be replaced with {{map.computeIfAbsent(key, 
k -> new LinkedList()).add(v);}}

{code:java}
private void addBlockToPending(DatanodeDescriptor dn, BlockInfo block) {
List blockList = pendingRep.get(dn);
  if (blockList == null) {
 blockList = new LinkedList<>();
pendingRep.put(dn, blockList);
  }
  blockList.add(block);
}
{code}

https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#computeIfAbsent-K-java.util.function.Function-

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-14909:
--
Summary: DFSNetworkTopology#chooseRandomWithStorageType() should not 
decrease storage count for excluded node which is already part of excluded 
scope   (was: DFSNetworkTopology#chooseRandomWithStorageType() should not 
decrese storage count for excluded node which is already part of excluded scope 
)

> DFSNetworkTopology#chooseRandomWithStorageType() should not decrease storage 
> count for excluded node which is already part of excluded scope 
> -
>
> Key: HDFS-14909
> URL: https://issues.apache.org/jira/browse/HDFS-14909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952009#comment-16952009
 ] 

David Mollitor commented on HDFS-14854:
---

Nit: this is not very java-y...
{code:java}
final List toRemove = new ArrayList<>();
...
processMaintenanceNodes(toRemove);
...

// Check if any nodes have reached zero blocks and also update the stats
// exposed via JMX for all nodes still being processed.
checkForCompletedNodes(toRemove);

// Finally move the nodes to their final state if they are ready.
processCompletedNodes(toRemove);
{code}
Better to remove coupling:
{code:java}
final List maintenanceExpiredNodes = 
getMaintenanceNodes();
...

final List completedNodes = getCompletedNodes();

Iterable nodesToRemove = Iterables.unmodifiableIterable(
  Iterables.concat(maintenanceExpiredNodes , completedNodes));

// Finally move the nodes to their final state if they are ready.
processCompletedNodes(Lists.newArrayList(nodesToRemove));
{code}

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrese storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952007#comment-16952007
 ] 

Surendra Singh Lilhore commented on HDFS-14909:
---

Below code it decreasing count for {{excludedScope}}.
{code:java}
if (excludeRoot != null && root.isAncestor(excludeRoot)) {
  if (excludeRoot instanceof DFSTopologyNodeImpl) {
availableCount -= ((DFSTopologyNodeImpl)excludeRoot)
.getSubtreeStorageCount(type);
  } else {
availableCount -= ((DatanodeDescriptor)excludeRoot)
.hasStorageType(type) ? 1 : 0;
  }
} {code}
Again this code decreasing count for {{excludedNodes}}, but is excluded node is 
part of {{excludedScope}} then no need to decrease the count.
{code:java}
if (excludedNodes != null) {
  for (Node excludedNode : excludedNodes) {
if (excludedNode instanceof DatanodeDescriptor) {
  availableCount -= ((DatanodeDescriptor) excludedNode)
  .hasStorageType(type) ? 1 : 0;
} else if (excludedNode instanceof DFSTopologyNodeImpl) {
  availableCount -= ((DFSTopologyNodeImpl) excludedNode)
  .getSubtreeStorageCount(type);
} else if (excludedNode instanceof DatanodeInfo) {
 ...
  }
}{code}
Because of this {{availableCount}} is in negative value which is not expected
{code:java}
if (availableCount <= 0) {
  // should never be <0 in general, adding <0 check for safety purpose
  return null;
}{code}

> DFSNetworkTopology#chooseRandomWithStorageType() should not decrese storage 
> count for excluded node which is already part of excluded scope 
> 
>
> Key: HDFS-14909
> URL: https://issues.apache.org/jira/browse/HDFS-14909
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14909) DFSNetworkTopology#chooseRandomWithStorageType() should not decrese storage count for excluded node which is already part of excluded scope

2019-10-15 Thread Surendra Singh Lilhore (Jira)
Surendra Singh Lilhore created HDFS-14909:
-

 Summary: DFSNetworkTopology#chooseRandomWithStorageType() should 
not decrese storage count for excluded node which is already part of excluded 
scope 
 Key: HDFS-14909
 URL: https://issues.apache.org/jira/browse/HDFS-14909
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.1.1
Reporter: Surendra Singh Lilhore
Assignee: Surendra Singh Lilhore






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-15 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952001#comment-16952001
 ] 

David Mollitor commented on HDFS-14854:
---

{code:java}
  private void processPendingNodes() {
while (!pendingNodes.isEmpty() &&
(maxConcurrentTrackedNodes == 0 ||
outOfServiceNodeBlocks.size() < maxConcurrentTrackedNodes)) {
  outOfServiceNodeBlocks.put(pendingNodes.poll(), null);
}
  }
{code}

This method is accessed by the local running Thread.  However, {{pendingNodes}} 
does not appear to be a thread-safe class.  Perhaps the collection cannot be 
modified because of the external locking of the {{writeLock}} but there is no 
requirement to have the lock stated in the {{startTrackingNode}} method.

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >