[jira] [Commented] (HDDS-1894) Support listPipelines by filters in scmcli

2019-08-07 Thread Li Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901789#comment-16901789
 ] 

Li Cheng commented on HDDS-1894:


[~xyao] Is it essentially just trying to do what "listPipelines | grep 
'Factor:THREE, State:OPEN'" does?

> Support listPipelines by filters in scmcli
> --
>
> Key: HDDS-1894
> URL: https://issues.apache.org/jira/browse/HDDS-1894
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Li Cheng
>Priority: Major
>
> Today scmcli has a subcmd that allow list all pipelines. This ticket is 
> opened to filter the results by switches, e.g., filter by Factor: THREE and 
> State: OPEN. This will be useful for trouble shooting in large cluster.
>  
> {code}
> bin/ozone scmcli listPipelines
> Pipeline[ Id: a8d1b0c9-e1d4-49ea-8746-3f61dfb5ee3f, Nodes: 
> cce44fde-bc8d-4063-97b3-6f557af756e1\{ip: 10.17.112.65, host: 
> ia0230.halxg.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}, Type:RATIS, Factor:ONE, State:OPEN]
> Pipeline[ Id: c9c453d1-d74c-4414-b87f-1d3585d78a7c, Nodes: 
> 0b7b0b93-8323-4b82-8cc0-a9a5c10ab827\{ip: 10.17.112.29, host: 
> ia0138.halxg.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}c756a0e0-5a1b-4d03-ba5b-cafbcabac877\{ip: 10.17.112.27, host: 
> ia0134.halxg.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}bee45bd7-1ee6-4726-b3d1-81476dc1eb49\{ip: 10.17.112.28, host: 
> ia0136.halxg.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null}, Type:RATIS, Factor:THREE, State:OPEN]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14662) Document the usage of the new Balancer "asService" parameter

2019-08-07 Thread Chen Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-14662:
--
Attachment: HDFS-14662.003.patch

> Document the usage of the new Balancer "asService" parameter
> 
>
> Key: HDFS-14662
> URL: https://issues.apache.org/jira/browse/HDFS-14662
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14662.001.patch, HDFS-14662.002.patch, 
> HDFS-14662.003.patch
>
>
> see HDFS-13783, this jira add document for how to run balancer as a long 
> service



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14662) Document the usage of the new Balancer "asService" parameter

2019-08-07 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901813#comment-16901813
 ] 

Chen Zhang commented on HDFS-14662:
---

uploaded patch v3

> Document the usage of the new Balancer "asService" parameter
> 
>
> Key: HDFS-14662
> URL: https://issues.apache.org/jira/browse/HDFS-14662
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14662.001.patch, HDFS-14662.002.patch, 
> HDFS-14662.003.patch
>
>
> see HDFS-13783, this jira add document for how to run balancer as a long 
> service



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14662) Document the usage of the new Balancer "asService" parameter

2019-08-07 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901812#comment-16901812
 ] 

Chen Zhang commented on HDFS-14662:
---

Thanks [~jojochuang] [~ayushtkn] for your review. I'll upload a new patch to 
fix the whitespace issue

> Document the usage of the new Balancer "asService" parameter
> 
>
> Key: HDFS-14662
> URL: https://issues.apache.org/jira/browse/HDFS-14662
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14662.001.patch, HDFS-14662.002.patch, 
> HDFS-14662.003.patch
>
>
> see HDFS-13783, this jira add document for how to run balancer as a long 
> service



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?focusedWorklogId=290312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290312
 ]

ASF GitHub Bot logged work on HDDS-1924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 09:01
Start Date: 07/Aug/19 09:01
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #1245: HDDS-1924. 
ozone sh bucket path command does not exist
URL: https://github.com/apache/hadoop/pull/1245
 
 
   ## What changes were proposed in this pull request?
   
   Fix leftover reference to `ozone sh bucket path`.  It was generally changed 
to `ozone s3 path` in 
[HDDS-761](https://issues.apache.org/jira/browse/HDDS-761).
   
   https://issues.apache.org/jira/browse/HDDS-1924
   
   ## How was this patch tested?
   
   ```
   $ aws s3api --endpoint http://localhost:9878/ create-bucket 
--bucket=wordcount
   {
   "Location": "http://localhost:9878/wordcount";
   }
   
   $ docker-compose exec om ozone s3 path wordcount
   Volume name for S3Bucket is : s3100b8cad7cf2a56f6df78f171f97a1ec
   Ozone FileSystem Uri is : o3fs://wordcount.s3100b8cad7cf2a56f6df78f171f
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290312)
Time Spent: 10m
Remaining Estimate: 0h

> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation, Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-1924:
-
Labels: pull-request-available  (was: )

> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation, Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>  Labels: pull-request-available
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread Doroszlai, Attila (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doroszlai, Attila updated HDDS-1924:

Target Version/s: 0.4.1  (was: 0.4.0)
  Status: Patch Available  (was: In Progress)

> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation, Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?focusedWorklogId=290317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290317
 ]

ASF GitHub Bot logged work on HDDS-1924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 09:02
Start Date: 07/Aug/19 09:02
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on issue #1245: HDDS-1924. ozone 
sh bucket path command does not exist
URL: https://github.com/apache/hadoop/pull/1245#issuecomment-519009755
 
 
   @mukul1987 please review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290317)
Time Spent: 20m  (was: 10m)

> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation, Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14662) Document the usage of the new Balancer "asService" parameter

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901871#comment-16901871
 ] 

Hadoop QA commented on HDFS-14662:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
28m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 44m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14662 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976899/HDFS-14662.003.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux a5ec57a97e7f 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9cd211a |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 431 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27431/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Document the usage of the new Balancer "asService" parameter
> 
>
> Key: HDFS-14662
> URL: https://issues.apache.org/jira/browse/HDFS-14662
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14662.001.patch, HDFS-14662.002.patch, 
> HDFS-14662.003.patch
>
>
> see HDFS-13783, this jira add document for how to run balancer as a long 
> service



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14623) In NameNode Web UI, for Head the file (first 32K) old data is showing

2019-08-07 Thread hemanthboyina (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14623:
-
Attachment: HDFS-14623.001.patch

> In NameNode Web UI, for Head the file (first 32K) old data is showing
> -
>
> Key: HDFS-14623
> URL: https://issues.apache.org/jira/browse/HDFS-14623
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14623.001.patch, HDFS-14623.patch, afterfix.JPG, 
> beforefix.JPG
>
>
> In Namenode Web UI , for Head the file (first 32K) 
> After opening multiple files and clicking on - "Head the file" is showing 
> wrong data 
> Scenario : 
> Uploaded Namenode log and Zkfc log , clicked head the file of namenode log 
> multiple times , then went for zkfc log and clicked on head the file , wrong 
> data is showing 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread Doroszlai, Attila (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doroszlai, Attila updated HDDS-1924:

Component/s: documentation

> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation, Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14662) Document the usage of the new Balancer "asService" parameter

2019-08-07 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901907#comment-16901907
 ] 

Ayush Saxena commented on HDFS-14662:
-

+1

> Document the usage of the new Balancer "asService" parameter
> 
>
> Key: HDFS-14662
> URL: https://issues.apache.org/jira/browse/HDFS-14662
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14662.001.patch, HDFS-14662.002.patch, 
> HDFS-14662.003.patch
>
>
> see HDFS-13783, this jira add document for how to run balancer as a long 
> service



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14707) Add JAVA_LIBRARY_PATH to HTTPFS startup options in branch-2

2019-08-07 Thread Masatake Iwasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901906#comment-16901906
 ] 

Masatake Iwasaki commented on HDFS-14707:
-

I manually tested the modified httpfs.sh using downloaded hadoop-2.9.2.tar.gz 
with debug logging enabled in httpfs-log4j.properties.

Native library was not loaded without the fix.
{noformat}
2019-08-07 17:24:51,001 DEBUG NativeCodeLoader [][:]  Trying to load the 
custom-built native-hadoop library...
2019-08-07 17:24:51,001 DEBUG NativeCodeLoader [][:]  Failed to load 
native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in 
java.library.path
2019-08-07 17:24:51,001 DEBUG NativeCodeLoader [][:]  
java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-08-07 17:24:51,002  WARN NativeCodeLoader [][:]  Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
{noformat}

Native library was loaded with the patch.
{noformat}
2019-08-07 17:25:45,622 DEBUG NativeCodeLoader [][:]  Trying to load the 
custom-built native-hadoop library...
2019-08-07 17:25:45,623 DEBUG NativeCodeLoader [][:]  Loaded the native-hadoop 
library
{noformat}

>  Add JAVA_LIBRARY_PATH to HTTPFS startup options in branch-2
> 
>
> Key: HDFS-14707
> URL: https://issues.apache.org/jira/browse/HDFS-14707
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
> Attachments: HDFS-14707-branch-2.001.patch
>
>
> Currently HTTPFS does not load hadoop native library since java.library.path 
> is not set on Tomcat startup.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313-branch-2.v1.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313-branch-2.v1.patch, HDFS-14313.000.patch, 
> HDFS-14313.001.patch, HDFS-14313.002.patch, HDFS-14313.003.patch, 
> HDFS-14313.004.patch, HDFS-14313.005.patch, HDFS-14313.006.patch, 
> HDFS-14313.007.patch, HDFS-14313.008.patch, HDFS-14313.009.patch, 
> HDFS-14313.010.patch, HDFS-14313.011.patch, HDFS-14313.012.patch, 
> HDFS-14313.013.patch, HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1924:
--
Sprint: HDDS Biscayne

> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation, Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1922) Next button on the bottom of "static/docs/index.html" landing page does not work

2019-08-07 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1922:
--
Sprint: HDDS Biscayne

> Next button on the bottom of "static/docs/index.html" landing page does not 
> work
> 
>
> Key: HDDS-1922
> URL: https://issues.apache.org/jira/browse/HDDS-1922
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>
> On Ozone landing doc page, the next link doesn't work .



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1922) Next button on the bottom of "static/docs/index.html" landing page does not work

2019-08-07 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1922:
--
Target Version/s: 0.4.1  (was: 0.4.0)

> Next button on the bottom of "static/docs/index.html" landing page does not 
> work
> 
>
> Key: HDDS-1922
> URL: https://issues.apache.org/jira/browse/HDDS-1922
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>
> On Ozone landing doc page, the next link doesn't work .



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2019-08-07 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901887#comment-16901887
 ] 

He Xiaoqiao commented on HDFS-14703:


Thanks [~shv] for file this JIRA and plan to push this feature forward, it is 
very great work. Really appreciate doing this.
 There are some details I am confused after reading the design document.
 As design document said, each inode maps (through inode key) to one RangeMap 
who has a separate lock and carry out concurrently.
{quote}The inode key is a fixed length sequence of parent inodeids ending with 
the file inode id itself:
    key(f) = 
 Where selfId is the inodeId of file f, pId is the id of its parent, and ppId 
is the id of the parent of the parent. Such definition of a key guarantees that 
not only siblings but also cousins (objects having the same grandparent) are 
partitioned into the same range most of the time
{quote}
Consider the following path: /a/b/c/d/e, corresponding inode id is [ida, idb, 
idc, idd].
 1. How we could guarantee to map 'cousins' into the same range? In my first 
opinion, it could map to different RangeMaps, since for idc, its inode key = 
 and for idd its inode key = .
 2. Any consideration about operating one nodes and its ancestor node 
concurrently? for instance, /a/b/c/d/e/f, we could delete inode c and modify 
inode f at the same time if they map to different range since we do not 
guarantee map them to the same one. maybe it is problem in the case.
 3. Which lock will be hold if request some global request like ha failover, 
safemode etc.? do we need to obtain all RangeMap lock?
 4. Any bottleneck meet after improve write throughput, I believe that EditLog 
OPS will keep increase, and will it to be the new bottleneck?
Please correct me if I do not understand correctly. Thanks.

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.branch-3.0.v2.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1923) static/docs/start.html page doesn't render correctly on Firefox

2019-08-07 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1923:
--
Target Version/s: 0.4.1  (was: 0.4.0)

> static/docs/start.html page doesn't render correctly on Firefox
> ---
>
> Key: HDDS-1923
> URL: https://issues.apache.org/jira/browse/HDDS-1923
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>
> static/docs/start.html page doesn't render correctly on Firefox



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1923) static/docs/start.html page doesn't render correctly on Firefox

2019-08-07 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1923:
--
Sprint: HDDS Biscayne

> static/docs/start.html page doesn't render correctly on Firefox
> ---
>
> Key: HDDS-1923
> URL: https://issues.apache.org/jira/browse/HDDS-1923
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>
> static/docs/start.html page doesn't render correctly on Firefox



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread Doroszlai, Attila (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doroszlai, Attila reassigned HDDS-1924:
---

Assignee: Doroszlai, Attila

> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: HDFS-14476-branch-2.01.patch

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread Doroszlai, Attila (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-1924 started by Doroszlai, Attila.
---
> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: HDFS-14476.01.patch

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476.00.patch, HDFS-14476.01.patch, 
> datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-07 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14701:
---
Attachment: HDFS-14701.002.patch

> Change Log Level to warn in SlotReleaser
> 
>
> Key: HDFS-14701
> URL: https://issues.apache.org/jira/browse/HDFS-14701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14701.001.patch, HDFS-14701.002.patch
>
>
>  if the corresponding DataNode has been stopped or restarted and DFSClient 
> close shared memory segment,releaseShortCircuitFds API throw expection and 
> log a ERROR Message. I think it should not be a ERROR log,and that log a warn 
> log is more reasonable.
> {code:java}
> // @Override
> public void run() {
>   LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
>   final DfsClientShm shm = (DfsClientShm)slot.getShm();
>   final DomainSocket shmSock = shm.getPeer().getDomainSocket();
>   final String path = shmSock.getPath();
>   boolean success = false;
>   try (DomainSocket sock = DomainSocket.connect(path);
>DataOutputStream out = new DataOutputStream(
>new BufferedOutputStream(sock.getOutputStream( {
> new Sender(out).releaseShortCircuitFds(slot.getSlotId());
> DataInputStream in = new DataInputStream(sock.getInputStream());
> ReleaseShortCircuitAccessResponseProto resp =
> ReleaseShortCircuitAccessResponseProto.parseFrom(
> PBHelperClient.vintPrefixed(in));
> if (resp.getStatus() != Status.SUCCESS) {
>   String error = resp.hasError() ? resp.getError() : "(unknown)";
>   throw new IOException(resp.getStatus().toString() + ": " + error);
> }
> LOG.trace("{}: released {}", this, slot);
> success = true;
>   } catch (IOException e) {
> LOG.error(ShortCircuitCache.this + ": failed to release " +
> "short-circuit shared memory slot " + slot + " by sending " +
> "ReleaseShortCircuitAccessRequestProto to " + path +
> ".  Closing shared memory segment.", e);
>   } finally {
> if (success) {
>   shmManager.freeSlot(slot);
> } else {
>   shm.getEndpointShmManager().shutdown(shm);
> }
>   }
> }
> {code}
>  *exception stack:*
> {code:java}
> 2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
> ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
> slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
> sending ReleaseShortCircuitAccessRequestProto to 
> /home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared 
> memory segment.
> java.io.IOException: ERROR_INVALID: there is no shared memory segment 
> registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901928#comment-16901928
 ] 

Hadoop QA commented on HDFS-14313:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.0 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
14s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
45s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
52s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m 
24s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
16s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
19s{color} | {color:green} branch-3.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m  
4s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 59s{color} | {color:orange} root: The patch generated 3 new + 296 unchanged 
- 1 fixed = 299 total (was 297) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 13 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 58s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 27s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}155m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ha.TestZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:e402791 |
| JIRA Issue | HDFS-14313 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976897/HDFS-14313.branch-3.0.v1.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  jav

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901864#comment-16901864
 ] 

Sean Chow commented on HDFS-14476:
--

patch updated!

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-07 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901929#comment-16901929
 ] 

Lisheng Sun commented on HDFS-14701:


Thanx [~jojochuang] for you suggestion. I updated the patch as your comment and 
uploaded the v002 patch. Could you help review it? Thank you.

> Change Log Level to warn in SlotReleaser
> 
>
> Key: HDFS-14701
> URL: https://issues.apache.org/jira/browse/HDFS-14701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14701.001.patch, HDFS-14701.002.patch
>
>
>  if the corresponding DataNode has been stopped or restarted and DFSClient 
> close shared memory segment,releaseShortCircuitFds API throw expection and 
> log a ERROR Message. I think it should not be a ERROR log,and that log a warn 
> log is more reasonable.
> {code:java}
> // @Override
> public void run() {
>   LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
>   final DfsClientShm shm = (DfsClientShm)slot.getShm();
>   final DomainSocket shmSock = shm.getPeer().getDomainSocket();
>   final String path = shmSock.getPath();
>   boolean success = false;
>   try (DomainSocket sock = DomainSocket.connect(path);
>DataOutputStream out = new DataOutputStream(
>new BufferedOutputStream(sock.getOutputStream( {
> new Sender(out).releaseShortCircuitFds(slot.getSlotId());
> DataInputStream in = new DataInputStream(sock.getInputStream());
> ReleaseShortCircuitAccessResponseProto resp =
> ReleaseShortCircuitAccessResponseProto.parseFrom(
> PBHelperClient.vintPrefixed(in));
> if (resp.getStatus() != Status.SUCCESS) {
>   String error = resp.hasError() ? resp.getError() : "(unknown)";
>   throw new IOException(resp.getStatus().toString() + ": " + error);
> }
> LOG.trace("{}: released {}", this, slot);
> success = true;
>   } catch (IOException e) {
> LOG.error(ShortCircuitCache.this + ": failed to release " +
> "short-circuit shared memory slot " + slot + " by sending " +
> "ReleaseShortCircuitAccessRequestProto to " + path +
> ".  Closing shared memory segment.", e);
>   } finally {
> if (success) {
>   shmManager.freeSlot(slot);
> } else {
>   shm.getEndpointShmManager().shutdown(shm);
> }
>   }
> }
> {code}
>  *exception stack:*
> {code:java}
> 2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
> ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
> slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
> sending ReleaseShortCircuitAccessRequestProto to 
> /home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared 
> memory segment.
> java.io.IOException: ERROR_INVALID: there is no shared memory segment 
> registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Affects Version/s: 3.0.3
   Attachment: HDFS-14476-branch-2.01.patch
   HDFS-14476.01.patch
   Status: Patch Available  (was: Open)

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.3, 2.7.0, 2.6.0
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, HDFS-14476.01.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?focusedWorklogId=290339&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290339
 ]

ASF GitHub Bot logged work on HDDS-1924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 09:51
Start Date: 07/Aug/19 09:51
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1245: HDDS-1924. ozone 
sh bucket path command does not exist
URL: https://github.com/apache/hadoop/pull/1245#issuecomment-519027084
 
 
   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 44 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 648 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 1479 | branch has no errors when building and testing 
our client artifacts. |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 587 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 649 | patch has no errors when building and testing 
our client artifacts. |
   ||| _ Other Tests _ |
   | +1 | asflicense | 43 | The patch does not generate ASF License warnings. |
   | | | 2959 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1245/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1245 |
   | Optional Tests | dupname asflicense mvnsite |
   | uname | Linux c0ca2cbf2855 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 9cd211a |
   | Max. process+thread count | 447 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/docs U: hadoop-hdds/docs |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1245/1/console |
   | versions | git=2.7.4 maven=3.3.9 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290339)
Time Spent: 0.5h  (was: 20m)

> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation, Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901944#comment-16901944
 ] 

Hadoop QA commented on HDFS-14476:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
44s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 5 new + 45 unchanged - 0 fixed = 50 total (was 45) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 35s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestSafeMode |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:da67579 |
| JIRA Issue | HDFS-14476 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976908/HDFS-14476-branch-2.01.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shad

[jira] [Commented] (HDDS-1918) hadoop-ozone-tools has integration tests run as unit

2019-08-07 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901949#comment-16901949
 ] 

Elek, Marton commented on HDDS-1918:


Can we just remove those tests in the future? I think most the functionalities 
are already tested with the smoketests (which is more stable...).

> hadoop-ozone-tools has integration tests run as unit
> 
>
> Key: HDDS-1918
> URL: https://issues.apache.org/jira/browse/HDDS-1918
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: build, test
>Affects Versions: 0.4.1
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.4.1, 0.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HDDS-1735 created separate test runner scripts for unit and integration tests.
> Problem: {{hadoop-ozone-tools}} tests are currently run as part of the unit 
> tests, but most of them start a {{MiniOzoneCluster}}, which is defined in 
> {{hadoop-ozone-integration-test}}.  Thus I think these tests are really 
> integration tests, and should be run by {{integration.sh}} instead.  There 
> are currently only 3 real unit tests:
> {noformat}
> hadoop-ozone/tools/src/test/java/org/apache/hadoop/ozone/audit/parser/TestAuditParser.java
> hadoop-ozone/tools/src/test/java/org/apache/hadoop/ozone/freon/TestProgressBar.java
> hadoop-ozone/tools/src/test/java/org/apache/hadoop/ozone/genconf/TestGenerateOzoneRequiredConfigurations.java
> {noformat}
> {{hadoop-ozone-tools}} tests take ~6 minutes.
> Possible solutions in order of increasing complexity:
> # Run {{hadoop-ozone-tools}} tests in {{integration.sh}} instead of 
> {{unit.sh}} (This is similar to {{hadoop-ozone-filesystem}}, which is already 
> run by {{integration.sh}} and has 2 real unit tests.)
> # Move all integration test classes to the {{hadoop-ozone-integration-test}} 
> module, and make it depend on {{hadoop-ozone-tools}} and 
> {{hadoop-ozone-filesystem}} instead of the other way around.
> # Rename integration test classes to {{\*IT.java}} or {{IT\*.java}}, add 
> filters for Surefire runs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1925) ozonesecure acceptance test broken by HTTP auth requirement

2019-08-07 Thread Doroszlai, Attila (JIRA)
Doroszlai, Attila created HDDS-1925:
---

 Summary: ozonesecure acceptance test broken by HTTP auth 
requirement
 Key: HDDS-1925
 URL: https://issues.apache.org/jira/browse/HDDS-1925
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: docker, test
Affects Versions: 0.4.1
Reporter: Doroszlai, Attila


Acceptance test is failing at {{ozonesecure}} with the following error from 
{{jq}}:

{noformat:title=https://github.com/elek/ozone-ci/blob/325779d34623061e27b80ade3b749210648086d1/byscane/byscane-nightly-ds7lx/acceptance/output.log#L2779}
parse error: Invalid numeric literal at line 2, column 0
{noformat}

Example compose environments wait for datanodes to be up:

{code:title=https://github.com/apache/hadoop/blob/9cd211ac86bb1124bdee572fddb6f86655b19b73/hadoop-ozone/dist/src/main/compose/testlib.sh#L71-L72}
  docker-compose -f "$COMPOSE_FILE" up -d --scale datanode="${datanode_count}"
  wait_for_datanodes "$COMPOSE_FILE" "${datanode_count}"
{code}

The number of datanodes up is determined via HTTP query of JMX endpoint:

{code:title=https://github.com/apache/hadoop/blob/9cd211ac86bb1124bdee572fddb6f86655b19b73/hadoop-ozone/dist/src/main/compose/testlib.sh#L44-L46}
 #This line checks the number of HEALTHY datanodes registered in scm over 
the
 # jmx HTTP servlet
 datanodes=$(docker-compose -f "${compose_file}" exec -T scm curl -s 
'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
 | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value')
{code}

The problem is that no authentication is performed before or during the 
request, which is no longer allowed since HDDS-1901:

{code}
$ docker-compose exec -T scm curl -s 
'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'



Error 401 Authentication required

HTTP ERROR 401
Problem accessing /jmx. Reason:
Authentication required


{code}

{code}
$ docker-compose exec -T scm curl -s 
'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
 | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value'
parse error: Invalid numeric literal at line 2, column 0
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14623) In NameNode Web UI, for Head the file (first 32K) old data is showing

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901951#comment-16901951
 ] 

Hadoop QA commented on HDFS-14623:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
28m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 4 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14623 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976912/HDFS-14623.001.patch |
| Optional Tests |  dupname  asflicense  shadedclient  |
| uname | Linux 4c92e75dd041 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9cd211a |
| maven | version: Apache Maven 3.3.9 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27434/artifact/out/whitespace-eol.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27434/artifact/out/whitespace-tabs.txt
 |
| Max. process+thread count | 460 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27434/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> In NameNode Web UI, for Head the file (first 32K) old data is showing
> -
>
> Key: HDFS-14623
> URL: https://issues.apache.org/jira/browse/HDFS-14623
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14623.001.patch, HDFS-14623.patch, afterfix.JPG, 
> beforefix.JPG
>
>
> In Namenode Web UI , for Head the file (first 32K) 
> After opening multiple files and clicking on - "Head the file" is showing 
> wrong data 
> Scenario : 
> Uploaded Namenode log and Zkfc log , clicked head the file of namenode log 
> multiple times , then went for zkfc log and clicked on head the file , wrong 
> data is showing 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1925) ozonesecure acceptance test broken by HTTP auth requirement

2019-08-07 Thread Doroszlai, Attila (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doroszlai, Attila reassigned HDDS-1925:
---

Assignee: Doroszlai, Attila

> ozonesecure acceptance test broken by HTTP auth requirement
> ---
>
> Key: HDDS-1925
> URL: https://issues.apache.org/jira/browse/HDDS-1925
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: docker, test
>Affects Versions: 0.4.1
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Critical
>
> Acceptance test is failing at {{ozonesecure}} with the following error from 
> {{jq}}:
> {noformat:title=https://github.com/elek/ozone-ci/blob/325779d34623061e27b80ade3b749210648086d1/byscane/byscane-nightly-ds7lx/acceptance/output.log#L2779}
> parse error: Invalid numeric literal at line 2, column 0
> {noformat}
> Example compose environments wait for datanodes to be up:
> {code:title=https://github.com/apache/hadoop/blob/9cd211ac86bb1124bdee572fddb6f86655b19b73/hadoop-ozone/dist/src/main/compose/testlib.sh#L71-L72}
>   docker-compose -f "$COMPOSE_FILE" up -d --scale datanode="${datanode_count}"
>   wait_for_datanodes "$COMPOSE_FILE" "${datanode_count}"
> {code}
> The number of datanodes up is determined via HTTP query of JMX endpoint:
> {code:title=https://github.com/apache/hadoop/blob/9cd211ac86bb1124bdee572fddb6f86655b19b73/hadoop-ozone/dist/src/main/compose/testlib.sh#L44-L46}
>  #This line checks the number of HEALTHY datanodes registered in scm over 
> the
>  # jmx HTTP servlet
>  datanodes=$(docker-compose -f "${compose_file}" exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
>  | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value')
> {code}
> The problem is that no authentication is performed before or during the 
> request, which is no longer allowed since HDDS-1901:
> {code}
> $ docker-compose exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
> 
> 
> 
> Error 401 Authentication required
> 
> HTTP ERROR 401
> Problem accessing /jmx. Reason:
> Authentication required
> 
> 
> {code}
> {code}
> $ docker-compose exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
>  | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value'
> parse error: Invalid numeric literal at line 2, column 0
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-1925) ozonesecure acceptance test broken by HTTP auth requirement

2019-08-07 Thread Doroszlai, Attila (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-1925 started by Doroszlai, Attila.
---
> ozonesecure acceptance test broken by HTTP auth requirement
> ---
>
> Key: HDDS-1925
> URL: https://issues.apache.org/jira/browse/HDDS-1925
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: docker, test
>Affects Versions: 0.4.1
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Critical
>
> Acceptance test is failing at {{ozonesecure}} with the following error from 
> {{jq}}:
> {noformat:title=https://github.com/elek/ozone-ci/blob/325779d34623061e27b80ade3b749210648086d1/byscane/byscane-nightly-ds7lx/acceptance/output.log#L2779}
> parse error: Invalid numeric literal at line 2, column 0
> {noformat}
> Example compose environments wait for datanodes to be up:
> {code:title=https://github.com/apache/hadoop/blob/9cd211ac86bb1124bdee572fddb6f86655b19b73/hadoop-ozone/dist/src/main/compose/testlib.sh#L71-L72}
>   docker-compose -f "$COMPOSE_FILE" up -d --scale datanode="${datanode_count}"
>   wait_for_datanodes "$COMPOSE_FILE" "${datanode_count}"
> {code}
> The number of datanodes up is determined via HTTP query of JMX endpoint:
> {code:title=https://github.com/apache/hadoop/blob/9cd211ac86bb1124bdee572fddb6f86655b19b73/hadoop-ozone/dist/src/main/compose/testlib.sh#L44-L46}
>  #This line checks the number of HEALTHY datanodes registered in scm over 
> the
>  # jmx HTTP servlet
>  datanodes=$(docker-compose -f "${compose_file}" exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
>  | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value')
> {code}
> The problem is that no authentication is performed before or during the 
> request, which is no longer allowed since HDDS-1901:
> {code}
> $ docker-compose exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
> 
> 
> 
> Error 401 Authentication required
> 
> HTTP ERROR 401
> Problem accessing /jmx. Reason:
> Authentication required
> 
> 
> {code}
> {code}
> $ docker-compose exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
>  | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value'
> parse error: Invalid numeric literal at line 2, column 0
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901971#comment-16901971
 ] 

Hadoop QA commented on HDFS-14701:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
9s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14701 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976915/HDFS-14701.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bf3a79c6e3ce 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9cd211a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27436/testReport/ |
| Max. process+thread count | 413 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27436/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Change Log Level to war

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: (was: HDFS-14476.01.patch)

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: (was: HDFS-14476-branch-2.01.patch)

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902009#comment-16902009
 ] 

Hadoop QA commented on HDFS-14313:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m  
7s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.0 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
34s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
37s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
21s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
56s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
17s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
40s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} branch-3.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 59s{color} | {color:orange} root: The patch generated 3 new + 296 unchanged 
- 1 fixed = 299 total (was 297) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  8s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m  9s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}191m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.util.TestReadWriteDiskValidator |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:e402791 |
| JIRA Issue | HDFS-14313 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976909/HDFS-14313.branch-3.0.v2.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux f9b46a824269 4.4.0-157-generic #185-Ubuntu SMP Tue Jul 23 
09:17:01 UTC 2019 x86_64 x86_64 x86_64 GNU

[jira] [Commented] (HDFS-14204) Backport HDFS-12943 to branch-2

2019-08-07 Thread huhaiyang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902014#comment-16902014
 ] 

huhaiyang commented on HDFS-14204:
--

[~vagarychen]

This feature is so useful that we want to deploy it to production version 
2.7.x. I want to know when do you plan to merge this into branch 2.

Thanks.

> Backport HDFS-12943 to branch-2
> ---
>
> Key: HDFS-14204
> URL: https://issues.apache.org/jira/browse/HDFS-14204
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14204-branch-2.001.patch, 
> HDFS-14204-branch-2.002.patch, HDFS-14204-branch-2.003.patch, 
> HDFS-14204-branch-2.004.patch, HDFS-14204-branch-2.005.patch
>
>
> Currently, consistent read from standby feature (HDFS-12943) is only in trunk 
> (branch-3). This JIRA aims to backport the feature to branch-2.  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-14204) Backport HDFS-12943 to branch-2

2019-08-07 Thread huhaiyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huhaiyang updated HDFS-14204:
-
Comment: was deleted

(was: [~vagarychen]

This feature is so useful that we want to deploy it to production version 
2.7.x. I want to know when do you plan to merge this into branch 2.

Thanks.)

> Backport HDFS-12943 to branch-2
> ---
>
> Key: HDFS-14204
> URL: https://issues.apache.org/jira/browse/HDFS-14204
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14204-branch-2.001.patch, 
> HDFS-14204-branch-2.002.patch, HDFS-14204-branch-2.003.patch, 
> HDFS-14204-branch-2.004.patch, HDFS-14204-branch-2.005.patch
>
>
> Currently, consistent read from standby feature (HDFS-12943) is only in trunk 
> (branch-3). This JIRA aims to backport the feature to branch-2.  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14204) Backport HDFS-12943 to branch-2

2019-08-07 Thread huhaiyang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902016#comment-16902016
 ] 

huhaiyang commented on HDFS-14204:
--

hi [~vagarychen]

This feature is so useful and we want to deploy it to production version 2.7.x. 
I want to know when do you plan to merge this into branch 2.

Thanks.

> Backport HDFS-12943 to branch-2
> ---
>
> Key: HDFS-14204
> URL: https://issues.apache.org/jira/browse/HDFS-14204
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14204-branch-2.001.patch, 
> HDFS-14204-branch-2.002.patch, HDFS-14204-branch-2.003.patch, 
> HDFS-14204-branch-2.004.patch, HDFS-14204-branch-2.005.patch
>
>
> Currently, consistent read from standby feature (HDFS-12943) is only in trunk 
> (branch-3). This JIRA aims to backport the feature to branch-2.  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1920) Place ozone.om.address config key default value in ozone-site.xml

2019-08-07 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-1920:
---
Status: Patch Available  (was: Open)

> Place ozone.om.address config key default value in ozone-site.xml
> -
>
> Key: HDDS-1920
> URL: https://issues.apache.org/jira/browse/HDDS-1920
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:xml}
>
>  ozone.om.address
> -
> +0.0.0.0:9862
>  OM, REQUIRED
>  
>The address of the Ozone OM service. This allows clients to discover
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902024#comment-16902024
 ] 

Hadoop QA commented on HDFS-14313:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m  
5s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
10s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
47s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
22s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
45s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
13s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
29s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
51s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
52s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 52s{color} 
| {color:red} root-jdk1.7.0_95 with JDK v1.7.0_95 generated 3 new + 1441 
unchanged - 3 fixed = 1444 total (was 1444) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
31s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 10m 31s{color} 
| {color:red} root-jdk1.8.0_222 with JDK v1.8.0_222 generated 4 new + 1342 
unchanged - 4 fixed = 1346 total (was 1346) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 44s{color} | {color:orange} root: The patch generated 3 new + 324 unchanged 
- 1 fixed = 327 total (was 325) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
49s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 53s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}167m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.

[jira] [Created] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1926:
--

 Summary: The new caching layer is used for old OM requests but not 
updated
 Key: HDDS-1926
 URL: https://issues.apache.org/jira/browse/HDDS-1926
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: om
Reporter: Elek, Marton


HDDS-1499 introduced a new caching layer together with a double-buffer based db 
writer to support OM HA.

TLDR: I think the caching layer is not updated for new volume creation. And 
(slightly related to this problem) I suggest to separated the TypedTable and 
the caching layer.

## How to reproduce the problem?

1. Start a docker compose cluster
2. Create one volume (let's say `/vol1`)
3. Restart the om (!)
4. Try to create an _other_ volume twice!

```
bash-4.2$ ozone sh volume create /vol2
2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop as 
owner.
bash-4.2$ ozone sh volume create /vol2
2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop as 
owner.
```

Expected behavior is an error:

{code}
bash-4.2$ ozone sh volume create /vol1
2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop as 
owner.
bash-4.2$ ozone sh volume create /vol1
2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop as 
owner.
VOLUME_ALREADY_EXISTS 
{code}

The problem is that the new cache is used even for the old code path 
(TypedTable):

{code}
 @Override
  public VALUE get(KEY key) throws IOException {
// Here the metadata lock will guarantee that cache is not updated for same
// key during get key.

CacheResult> cacheResult =
cache.lookup(new CacheKey<>(key));

if (cacheResult.getCacheStatus() == EXISTS) {
  return cacheResult.getValue().getCacheValue();
} else if (cacheResult.getCacheStatus() == NOT_EXIST) {
  return null;
} else {
  return getFromTable(key);
}
  }
{code}

For volume table after the FIRST start it always returns with 
`getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:

{code}

  public CacheResult lookup(CACHEKEY cachekey) {

if (cache.size() == 0) {
  return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
  null);
}
{code}

But after a restart the cache is pre-loaded by the TypedTable.constructor. 
After the restart, the real caching logic will be used (as cache.size()>0), 
which cause a problem as the cache is NOT updated from the old code path.

An additional problem is that the cache is turned on for all the metadata table 
even if the cache is not required... 

## Proposed solution

As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
It's not updated during the typedTable.put() call but updated by a separated 
component during double-buffer flash.

I would suggest to remove the cache related methods from TypedTable (move to a 
separated implementation). I think this kind of caching can be independent from 
the TypedTable implementation. We can continue to use the simple TypedTable 
everywhere where we don't need to use any kind of caching.

For caching we can use a separated object. It would make it more visible that 
the cache should always be updated manually all the time. This separated 
caching utility may include a reference to the original TypedTable/Table. With 
this approach we can separate the different responsibilities but provide the 
same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2019-08-07 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901887#comment-16901887
 ] 

He Xiaoqiao edited comment on HDFS-14703 at 8/7/19 1:04 PM:


Thanks [~shv] for file this JIRA and plan to push this feature forward, it is 
very great work. Really appreciate doing this.
There are some details I am confused after reading the design document.
As design document said, each inode maps (through inode key) to one RangeMap 
who has a separate lock and carry out concurrently.
{quote}The inode key is a fixed length sequence of parent inodeids ending with 
the file inode id itself:
  key(f) = 
Where selfId is the inodeId of file f, pId is the id of its parent, and ppId is 
the id of the parent of the parent. Such definition of a key guarantees that 
not only siblings but also cousins (objects having the same grandparent) are 
partitioned into the same range most of the time
{quote}
Consider the following path: /a/b/c/d, corresponding inode id is [ida, idb, 
idc, idd].
1. How could we guarantee to map 'cousins' into the same range? In my first 
opinion, it could map to different RangeMaps, since for idc, its inode key = 
 and for idd its inode key = . Furthermore, if we 
rename one inode from one range to another one, do we need to bring it's all 
children and sub-tree inode transfer to another range? 
2. Any consideration about operating one nodes and its ancestor node 
concurrently? for instance, /a/b/c/d/e/f, we could delete inode c and modify 
inode f at the same time if they map to different range since we do not 
guarantee map them to the same one. maybe it is problem in the case.
3. Which lock will be hold if request some global request like ha failover, 
safemode etc.? do we need to obtain all RangeMap lock?
4. Any bottleneck meet after improve write throughput, I believe that EditLog 
OPS will keep increase, and will it to be the new bottleneck?
Please correct me if I do not understand correctly. Thanks.


was (Author: hexiaoqiao):
Thanks [~shv] for file this JIRA and plan to push this feature forward, it is 
very great work. Really appreciate doing this.
 There are some details I am confused after reading the design document.
 As design document said, each inode maps (through inode key) to one RangeMap 
who has a separate lock and carry out concurrently.
{quote}The inode key is a fixed length sequence of parent inodeids ending with 
the file inode id itself:
    key(f) = 
 Where selfId is the inodeId of file f, pId is the id of its parent, and ppId 
is the id of the parent of the parent. Such definition of a key guarantees that 
not only siblings but also cousins (objects having the same grandparent) are 
partitioned into the same range most of the time
{quote}
Consider the following path: /a/b/c/d/e, corresponding inode id is [ida, idb, 
idc, idd].
 1. How we could guarantee to map 'cousins' into the same range? In my first 
opinion, it could map to different RangeMaps, since for idc, its inode key = 
 and for idd its inode key = .
 2. Any consideration about operating one nodes and its ancestor node 
concurrently? for instance, /a/b/c/d/e/f, we could delete inode c and modify 
inode f at the same time if they map to different range since we do not 
guarantee map them to the same one. maybe it is problem in the case.
 3. Which lock will be hold if request some global request like ha failover, 
safemode etc.? do we need to obtain all RangeMap lock?
 4. Any bottleneck meet after improve write throughput, I believe that EditLog 
OPS will keep increase, and will it to be the new bottleneck?
Please correct me if I do not understand correctly. Thanks.

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1907) TestOzoneRpcClientWithRatis is failing with ACL errors

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1907?focusedWorklogId=290449&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290449
 ]

ASF GitHub Bot logged work on HDDS-1907:


Author: ASF GitHub Bot
Created on: 07/Aug/19 13:06
Start Date: 07/Aug/19 13:06
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #1239: 
HDDS-1907. TestOzoneRpcClientWithRatis is failing with ACL errors. Co…
URL: https://github.com/apache/hadoop/pull/1239
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290449)
Time Spent: 40m  (was: 0.5h)

> TestOzoneRpcClientWithRatis is failing with ACL errors
> --
>
> Key: HDDS-1907
> URL: https://issues.apache.org/jira/browse/HDDS-1907
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] 
> testNativeAclsForKey(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.176 s  <<< FAILURE!
> java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], 
> group:staff:a[ACCESS], group:everyone:a[ACCESS], 
> group:localaccounts:a[ACCESS], group:_appserverusr:a[ACCESS], 
> group:admin:a[ACCESS], group:_appserveradm:a[ACCESS], 
> group:_lpadmin:a[ACCESS], group:com.apple.sharepoint.group.1:a[ACCESS], 
> group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], 
> group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], 
> group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], 
> group:com.apple.access_screensharing:a[ACCESS], 
> group:com.apple.access_ssh:a[ACCESS], 
> group:com.apple.sharepoint.group.3:a[ACCESS]] 
> inheritedUserAcl:user:remoteUser:r[ACCESS]
> [ERROR] 
> testNativeAclsForBucket(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.074 s  <<< FAILURE!
> java.lang.AssertionError
> [ERROR] 
> testNativeAclsForPrefix(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.061 s  <<< FAILURE!
> java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], 
> group:staff:a[ACCESS], group:everyone:a[ACCESS], 
> group:localaccounts:a[ACCESS], group:_appserverusr:a[ACCESS], 
> group:admin:a[ACCESS], group:_appserveradm:a[ACCESS], 
> group:_lpadmin:a[ACCESS], group:com.apple.sharepoint.group.1:a[ACCESS], 
> group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], 
> group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], 
> group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], 
> group:com.apple.access_screensharing:a[ACCESS], 
> group:com.apple.access_ssh:a[ACCESS], 
> group:com.apple.sharepoint.group.3:a[ACCESS]] 
> inheritedUserAcl:user:remoteUser:r[ACCESS]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1907) TestOzoneRpcClientWithRatis is failing with ACL errors

2019-08-07 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1907:
--
   Resolution: Fixed
Fix Version/s: 0.5.0
   0.4.1
   Status: Resolved  (was: Patch Available)

> TestOzoneRpcClientWithRatis is failing with ACL errors
> --
>
> Key: HDDS-1907
> URL: https://issues.apache.org/jira/browse/HDDS-1907
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1, 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] 
> testNativeAclsForKey(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.176 s  <<< FAILURE!
> java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], 
> group:staff:a[ACCESS], group:everyone:a[ACCESS], 
> group:localaccounts:a[ACCESS], group:_appserverusr:a[ACCESS], 
> group:admin:a[ACCESS], group:_appserveradm:a[ACCESS], 
> group:_lpadmin:a[ACCESS], group:com.apple.sharepoint.group.1:a[ACCESS], 
> group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], 
> group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], 
> group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], 
> group:com.apple.access_screensharing:a[ACCESS], 
> group:com.apple.access_ssh:a[ACCESS], 
> group:com.apple.sharepoint.group.3:a[ACCESS]] 
> inheritedUserAcl:user:remoteUser:r[ACCESS]
> [ERROR] 
> testNativeAclsForBucket(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.074 s  <<< FAILURE!
> java.lang.AssertionError
> [ERROR] 
> testNativeAclsForPrefix(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.061 s  <<< FAILURE!
> java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], 
> group:staff:a[ACCESS], group:everyone:a[ACCESS], 
> group:localaccounts:a[ACCESS], group:_appserverusr:a[ACCESS], 
> group:admin:a[ACCESS], group:_appserveradm:a[ACCESS], 
> group:_lpadmin:a[ACCESS], group:com.apple.sharepoint.group.1:a[ACCESS], 
> group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], 
> group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], 
> group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], 
> group:com.apple.access_screensharing:a[ACCESS], 
> group:com.apple.access_ssh:a[ACCESS], 
> group:com.apple.sharepoint.group.3:a[ACCESS]] 
> inheritedUserAcl:user:remoteUser:r[ACCESS]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1907) TestOzoneRpcClientWithRatis is failing with ACL errors

2019-08-07 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902049#comment-16902049
 ] 

Nanda kumar commented on HDDS-1907:
---

Thanks [~xyao] for the contribution and thanks [~bharatviswa] for review. 
Committed this to trunk and ozone-0.4.1 branch.

> TestOzoneRpcClientWithRatis is failing with ACL errors
> --
>
> Key: HDDS-1907
> URL: https://issues.apache.org/jira/browse/HDDS-1907
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] 
> testNativeAclsForKey(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.176 s  <<< FAILURE!
> java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], 
> group:staff:a[ACCESS], group:everyone:a[ACCESS], 
> group:localaccounts:a[ACCESS], group:_appserverusr:a[ACCESS], 
> group:admin:a[ACCESS], group:_appserveradm:a[ACCESS], 
> group:_lpadmin:a[ACCESS], group:com.apple.sharepoint.group.1:a[ACCESS], 
> group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], 
> group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], 
> group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], 
> group:com.apple.access_screensharing:a[ACCESS], 
> group:com.apple.access_ssh:a[ACCESS], 
> group:com.apple.sharepoint.group.3:a[ACCESS]] 
> inheritedUserAcl:user:remoteUser:r[ACCESS]
> [ERROR] 
> testNativeAclsForBucket(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.074 s  <<< FAILURE!
> java.lang.AssertionError
> [ERROR] 
> testNativeAclsForPrefix(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.061 s  <<< FAILURE!
> java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], 
> group:staff:a[ACCESS], group:everyone:a[ACCESS], 
> group:localaccounts:a[ACCESS], group:_appserverusr:a[ACCESS], 
> group:admin:a[ACCESS], group:_appserveradm:a[ACCESS], 
> group:_lpadmin:a[ACCESS], group:com.apple.sharepoint.group.1:a[ACCESS], 
> group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], 
> group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], 
> group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], 
> group:com.apple.access_screensharing:a[ACCESS], 
> group:com.apple.access_ssh:a[ACCESS], 
> group:com.apple.sharepoint.group.3:a[ACCESS]] 
> inheritedUserAcl:user:remoteUser:r[ACCESS]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1907) TestOzoneRpcClientWithRatis is failing with ACL errors

2019-08-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902051#comment-16902051
 ] 

Hudson commented on HDDS-1907:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17057/])
HDDS-1907. TestOzoneRpcClientWithRatis is failing with ACL errors. (nanda: rev 
70f46746b17c01450d2ef57edb2ce5314ab53308)
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestOzoneRpcClientAbstract.java


> TestOzoneRpcClientWithRatis is failing with ACL errors
> --
>
> Key: HDDS-1907
> URL: https://issues.apache.org/jira/browse/HDDS-1907
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1, 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] 
> testNativeAclsForKey(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.176 s  <<< FAILURE!
> java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], 
> group:staff:a[ACCESS], group:everyone:a[ACCESS], 
> group:localaccounts:a[ACCESS], group:_appserverusr:a[ACCESS], 
> group:admin:a[ACCESS], group:_appserveradm:a[ACCESS], 
> group:_lpadmin:a[ACCESS], group:com.apple.sharepoint.group.1:a[ACCESS], 
> group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], 
> group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], 
> group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], 
> group:com.apple.access_screensharing:a[ACCESS], 
> group:com.apple.access_ssh:a[ACCESS], 
> group:com.apple.sharepoint.group.3:a[ACCESS]] 
> inheritedUserAcl:user:remoteUser:r[ACCESS]
> [ERROR] 
> testNativeAclsForBucket(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.074 s  <<< FAILURE!
> java.lang.AssertionError
> [ERROR] 
> testNativeAclsForPrefix(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
>   Time elapsed: 0.061 s  <<< FAILURE!
> java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], 
> group:staff:a[ACCESS], group:everyone:a[ACCESS], 
> group:localaccounts:a[ACCESS], group:_appserverusr:a[ACCESS], 
> group:admin:a[ACCESS], group:_appserveradm:a[ACCESS], 
> group:_lpadmin:a[ACCESS], group:com.apple.sharepoint.group.1:a[ACCESS], 
> group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], 
> group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], 
> group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], 
> group:com.apple.access_screensharing:a[ACCESS], 
> group:com.apple.access_ssh:a[ACCESS], 
> group:com.apple.sharepoint.group.3:a[ACCESS]] 
> inheritedUserAcl:user:remoteUser:r[ACCESS]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1926:
--
Target Version/s: 0.4.1  (was: 0.5.0)

> The new caching layer is used for old OM requests but not updated
> -
>
> Key: HDDS-1926
> URL: https://issues.apache.org/jira/browse/HDDS-1926
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Elek, Marton
>Priority: Blocker
>
> HDDS-1499 introduced a new caching layer together with a double-buffer based 
> db writer to support OM HA.
> TLDR: I think the caching layer is not updated for new volume creation. And 
> (slightly related to this problem) I suggest to separated the TypedTable and 
> the caching layer.
> ## How to reproduce the problem?
> 1. Start a docker compose cluster
> 2. Create one volume (let's say `/vol1`)
> 3. Restart the om (!)
> 4. Try to create an _other_ volume twice!
> ```
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> ```
> Expected behavior is an error:
> {code}
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> VOLUME_ALREADY_EXISTS 
> {code}
> The problem is that the new cache is used even for the old code path 
> (TypedTable):
> {code}
>  @Override
>   public VALUE get(KEY key) throws IOException {
> // Here the metadata lock will guarantee that cache is not updated for 
> same
> // key during get key.
> CacheResult> cacheResult =
> cache.lookup(new CacheKey<>(key));
> if (cacheResult.getCacheStatus() == EXISTS) {
>   return cacheResult.getValue().getCacheValue();
> } else if (cacheResult.getCacheStatus() == NOT_EXIST) {
>   return null;
> } else {
>   return getFromTable(key);
> }
>   }
> {code}
> For volume table after the FIRST start it always returns with 
> `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:
> {code}
>   public CacheResult lookup(CACHEKEY cachekey) {
> if (cache.size() == 0) {
>   return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
>   null);
> }
> {code}
> But after a restart the cache is pre-loaded by the TypedTable.constructor. 
> After the restart, the real caching logic will be used (as cache.size()>0), 
> which cause a problem as the cache is NOT updated from the old code path.
> An additional problem is that the cache is turned on for all the metadata 
> table even if the cache is not required... 
> ## Proposed solution
> As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
> It's not updated during the typedTable.put() call but updated by a separated 
> component during double-buffer flash.
> I would suggest to remove the cache related methods from TypedTable (move to 
> a separated implementation). I think this kind of caching can be independent 
> from the TypedTable implementation. We can continue to use the simple 
> TypedTable everywhere where we don't need to use any kind of caching.
> For caching we can use a separated object. It would make it more visible that 
> the cache should always be updated manually all the time. This separated 
> caching utility may include a reference to the original TypedTable/Table. 
> With this approach we can separate the different responsibilities but provide 
> the same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1926:
--
Sprint: HDDS Biscayne

> The new caching layer is used for old OM requests but not updated
> -
>
> Key: HDDS-1926
> URL: https://issues.apache.org/jira/browse/HDDS-1926
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Elek, Marton
>Priority: Blocker
>
> HDDS-1499 introduced a new caching layer together with a double-buffer based 
> db writer to support OM HA.
> TLDR: I think the caching layer is not updated for new volume creation. And 
> (slightly related to this problem) I suggest to separated the TypedTable and 
> the caching layer.
> ## How to reproduce the problem?
> 1. Start a docker compose cluster
> 2. Create one volume (let's say `/vol1`)
> 3. Restart the om (!)
> 4. Try to create an _other_ volume twice!
> ```
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> ```
> Expected behavior is an error:
> {code}
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> VOLUME_ALREADY_EXISTS 
> {code}
> The problem is that the new cache is used even for the old code path 
> (TypedTable):
> {code}
>  @Override
>   public VALUE get(KEY key) throws IOException {
> // Here the metadata lock will guarantee that cache is not updated for 
> same
> // key during get key.
> CacheResult> cacheResult =
> cache.lookup(new CacheKey<>(key));
> if (cacheResult.getCacheStatus() == EXISTS) {
>   return cacheResult.getValue().getCacheValue();
> } else if (cacheResult.getCacheStatus() == NOT_EXIST) {
>   return null;
> } else {
>   return getFromTable(key);
> }
>   }
> {code}
> For volume table after the FIRST start it always returns with 
> `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:
> {code}
>   public CacheResult lookup(CACHEKEY cachekey) {
> if (cache.size() == 0) {
>   return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
>   null);
> }
> {code}
> But after a restart the cache is pre-loaded by the TypedTable.constructor. 
> After the restart, the real caching logic will be used (as cache.size()>0), 
> which cause a problem as the cache is NOT updated from the old code path.
> An additional problem is that the cache is turned on for all the metadata 
> table even if the cache is not required... 
> ## Proposed solution
> As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
> It's not updated during the typedTable.put() call but updated by a separated 
> component during double-buffer flash.
> I would suggest to remove the cache related methods from TypedTable (move to 
> a separated implementation). I think this kind of caching can be independent 
> from the TypedTable implementation. We can continue to use the simple 
> TypedTable everywhere where we don't need to use any kind of caching.
> For caching we can use a separated object. It would make it more visible that 
> the cache should always be updated manually all the time. This separated 
> caching utility may include a reference to the original TypedTable/Table. 
> With this approach we can separate the different responsibilities but provide 
> the same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14674) [SBN read] Got an unexpected txid when tail editlog

2019-08-07 Thread xuzq (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuzq updated HDFS-14674:

Attachment: image.png

> [SBN read] Got an unexpected txid when tail editlog
> ---
>
> Key: HDFS-14674
> URL: https://issues.apache.org/jira/browse/HDFS-14674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Blocker
> Attachments: HDFS-14674-001.patch, HDFS-14674-003.patch, 
> HDFS-14674-004.patch, HDFS-14674-005.patch, 
> image-2019-07-26-11-34-23-405.png, image.png
>
>
> Add the following configuration
> !image-2019-07-26-11-34-23-405.png!
> error:
> {code:java}
> //
> [2019-07-17T11:50:21.048+08:00] [INFO] [Edit log tailer] : replaying edit 
> log: 1/20512836 transactions completed. (0%) [2019-07-17T11:50:21.059+08:00] 
> [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  of size 3126782311 edits # 500 loaded in 3 seconds 
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Reading 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@51ceb7bc 
> expecting start txid #232056752162 [2019-07-17T11:50:21.059+08:00] [INFO] 
> [Edit log tailer] : Start loading edits file 
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  maxTxnipsToRead = 500 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log 
> tailer] : Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit 
> log tailer] ip: Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.061+08:00] [ERROR] [Edit 
> log tailer] : Unknown error encountered while tailing edits. Shutting down 
> standby NN. java.io.IOException: There appears to be a gap in the edit log. 
> We expected txid 232056752162, but got txid 232077264498. at 
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:239)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:895) at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
>  [2019-07-17T11:50:21.064+08:00] [INFO] [Edit log tailer] : Exiting with 
> status 1 [2019-07-17T11:50:21.066+08:00] [INFO] [Thread-1] : SHUTDOWN_MSG: 
> / SHUTDOWN_MSG: 
> Shutting down NameNode at ip 
> /
> {code}
>  
> if dfs.ha.tail-edits.max-txns-per-lock value is 500,when the namenode load 
> the editlog util 500,the current namenode will load the next editlog,but 
> editlog more than 500.So,namenode got an unexpected txid when tail editlog.
>  
>  
> {code:java}
> //
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003&segmentT

[jira] [Commented] (HDFS-14674) [SBN read] Got an unexpected txid when tail editlog

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902060#comment-16902060
 ] 

Hadoop QA commented on HDFS-14674:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-14674 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14674 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27437/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [SBN read] Got an unexpected txid when tail editlog
> ---
>
> Key: HDFS-14674
> URL: https://issues.apache.org/jira/browse/HDFS-14674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Blocker
> Attachments: HDFS-14674-001.patch, HDFS-14674-003.patch, 
> HDFS-14674-004.patch, HDFS-14674-005.patch, 
> image-2019-07-26-11-34-23-405.png, image.png
>
>
> Add the following configuration
> !image-2019-07-26-11-34-23-405.png!
> error:
> {code:java}
> //
> [2019-07-17T11:50:21.048+08:00] [INFO] [Edit log tailer] : replaying edit 
> log: 1/20512836 transactions completed. (0%) [2019-07-17T11:50:21.059+08:00] 
> [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  of size 3126782311 edits # 500 loaded in 3 seconds 
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Reading 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@51ceb7bc 
> expecting start txid #232056752162 [2019-07-17T11:50:21.059+08:00] [INFO] 
> [Edit log tailer] : Start loading edits file 
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  maxTxnipsToRead = 500 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log 
> tailer] : Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit 
> log tailer] ip: Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.061+08:00] [ERROR] [Edit 
> log tailer] : Unknown error encountered while tailing edits. Shutting down 
> standby NN. java.io.IOException: There appears to be a gap in the edit log. 
> We expected txid 232056752162, but got txid 232077264498. at 
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:239)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:895) at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$Edit

[jira] [Commented] (HDFS-14631) The DirectoryScanner doesn't fix the wrongly placed replica.

2019-08-07 Thread Jinglun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902063#comment-16902063
 ] 

Jinglun commented on HDFS-14631:


Hi [~xkrogen], thanks for your reminding. Yes, it's relevant to branch-2. 
Upload branch-2.9.001.patch and pend jenkins.

> The DirectoryScanner doesn't fix the wrongly placed replica.
> 
>
> Key: HDFS-14631
> URL: https://issues.apache.org/jira/browse/HDFS-14631
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14631.001.patch, HDFS-14631.002.patch, 
> HDFS-14631.003.patch, HDFS-14631.004.patch
>
>
> When DirectoryScanner scans block files, if the block refers to the block 
> file does not exist the DirectoryScanner will update the block based on the 
> replica file found on the disk. See FsDatasetImpl#checkAndUpdate.
>  
> {code:java}
> /*
> * Block exists in volumeMap and the block file exists on the disk
> */
> // Compare block files
> if (memBlockInfo.blockDataExists()) {
>   ...
> } else {
>   // Block refers to a block file that does not exist.
>   // Update the block with the file found on the disk. Since the block
>   // file and metadata file are found as a pair on the disk, update
>   // the block based on the metadata file found on the disk
>   LOG.warn("Block file in replica "
>   + memBlockInfo.getBlockURI()
>   + " does not exist. Updating it to the file found during scan "
>   + diskFile.getAbsolutePath());
>   memBlockInfo.updateWithReplica(
>   StorageLocation.parse(diskFile.toString()));
>   LOG.warn("Updating generation stamp for block " + blockId
>   + " from " + memBlockInfo.getGenerationStamp() + " to " + diskGS);
>   memBlockInfo.setGenerationStamp(diskGS);
> }
> {code}
> But the DirectoryScanner doesn't really fix it because in 
> LocalReplica#parseBaseDir() the 'subdir' are ignored.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14631) The DirectoryScanner doesn't fix the wrongly placed replica.

2019-08-07 Thread Jinglun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun reopened HDFS-14631:


upload patch for branch-2.x

> The DirectoryScanner doesn't fix the wrongly placed replica.
> 
>
> Key: HDFS-14631
> URL: https://issues.apache.org/jira/browse/HDFS-14631
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14631-branch-2.9.001.patch, HDFS-14631.001.patch, 
> HDFS-14631.002.patch, HDFS-14631.003.patch, HDFS-14631.004.patch
>
>
> When DirectoryScanner scans block files, if the block refers to the block 
> file does not exist the DirectoryScanner will update the block based on the 
> replica file found on the disk. See FsDatasetImpl#checkAndUpdate.
>  
> {code:java}
> /*
> * Block exists in volumeMap and the block file exists on the disk
> */
> // Compare block files
> if (memBlockInfo.blockDataExists()) {
>   ...
> } else {
>   // Block refers to a block file that does not exist.
>   // Update the block with the file found on the disk. Since the block
>   // file and metadata file are found as a pair on the disk, update
>   // the block based on the metadata file found on the disk
>   LOG.warn("Block file in replica "
>   + memBlockInfo.getBlockURI()
>   + " does not exist. Updating it to the file found during scan "
>   + diskFile.getAbsolutePath());
>   memBlockInfo.updateWithReplica(
>   StorageLocation.parse(diskFile.toString()));
>   LOG.warn("Updating generation stamp for block " + blockId
>   + " from " + memBlockInfo.getGenerationStamp() + " to " + diskGS);
>   memBlockInfo.setGenerationStamp(diskGS);
> }
> {code}
> But the DirectoryScanner doesn't really fix it because in 
> LocalReplica#parseBaseDir() the 'subdir' are ignored.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14631) The DirectoryScanner doesn't fix the wrongly placed replica.

2019-08-07 Thread Jinglun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-14631:
---
Fix Version/s: 2.9.3
   Attachment: HDFS-14631-branch-2.9.001.patch
   Status: Patch Available  (was: Reopened)

> The DirectoryScanner doesn't fix the wrongly placed replica.
> 
>
> Key: HDFS-14631
> URL: https://issues.apache.org/jira/browse/HDFS-14631
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14631-branch-2.9.001.patch, HDFS-14631.001.patch, 
> HDFS-14631.002.patch, HDFS-14631.003.patch, HDFS-14631.004.patch
>
>
> When DirectoryScanner scans block files, if the block refers to the block 
> file does not exist the DirectoryScanner will update the block based on the 
> replica file found on the disk. See FsDatasetImpl#checkAndUpdate.
>  
> {code:java}
> /*
> * Block exists in volumeMap and the block file exists on the disk
> */
> // Compare block files
> if (memBlockInfo.blockDataExists()) {
>   ...
> } else {
>   // Block refers to a block file that does not exist.
>   // Update the block with the file found on the disk. Since the block
>   // file and metadata file are found as a pair on the disk, update
>   // the block based on the metadata file found on the disk
>   LOG.warn("Block file in replica "
>   + memBlockInfo.getBlockURI()
>   + " does not exist. Updating it to the file found during scan "
>   + diskFile.getAbsolutePath());
>   memBlockInfo.updateWithReplica(
>   StorageLocation.parse(diskFile.toString()));
>   LOG.warn("Updating generation stamp for block " + blockId
>   + " from " + memBlockInfo.getGenerationStamp() + " to " + diskGS);
>   memBlockInfo.setGenerationStamp(diskGS);
> }
> {code}
> But the DirectoryScanner doesn't really fix it because in 
> LocalReplica#parseBaseDir() the 'subdir' are ignored.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1916) Only contract tests are run in ozonefs module

2019-08-07 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1916:
--
Sprint: HDDS Biscayne

> Only contract tests are run in ozonefs module
> -
>
> Key: HDDS-1916
> URL: https://issues.apache.org/jira/browse/HDDS-1916
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.3.0
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{hadoop-ozone-filesystem}} has 6 test classes that are not being run:
> {code}
> hadoop-ozone/ozonefs/src/test/java/org/apache/hadoop/fs/ozone/TestFilteredClassLoader.java
> hadoop-ozone/ozonefs/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFSInputStream.java
> hadoop-ozone/ozonefs/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFileInterfaces.java
> hadoop-ozone/ozonefs/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFileSystem.java
> hadoop-ozone/ozonefs/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFileSystemWithMocks.java
> hadoop-ozone/ozonefs/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFsRenameDir.java
> {code}
> {code:title=https://raw.githubusercontent.com/elek/ozone-ci/master/byscane/byscane-nightly-vxsck/integration/output.log}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hadoop.fs.ozone.contract.ITestOzoneContractDelete
> [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.956 
> s - in org.apache.hadoop.fs.ozone.contract.ITestOzoneContractDelete
> [INFO] Running org.apache.hadoop.fs.ozone.contract.ITestOzoneContractMkdir
> [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.528 
> s - in org.apache.hadoop.fs.ozone.contract.ITestOzoneContractMkdir
> [INFO] Running org.apache.hadoop.fs.ozone.contract.ITestOzoneContractSeek
> [INFO] Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 42.245 s - in org.apache.hadoop.fs.ozone.contract.ITestOzoneContractSeek
> [INFO] Running org.apache.hadoop.fs.ozone.contract.ITestOzoneContractOpen
> [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.996 
> s - in org.apache.hadoop.fs.ozone.contract.ITestOzoneContractOpen
> [INFO] Running org.apache.hadoop.fs.ozone.contract.ITestOzoneContractRename
> [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.816 
> s - in org.apache.hadoop.fs.ozone.contract.ITestOzoneContractRename
> [INFO] Running org.apache.hadoop.fs.ozone.contract.ITestOzoneContractDistCp
> [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 59.418 
> s - in org.apache.hadoop.fs.ozone.contract.ITestOzoneContractDistCp
> [INFO] Running 
> org.apache.hadoop.fs.ozone.contract.ITestOzoneContractGetFileStatus
> [INFO] Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 35.042 s - in 
> org.apache.hadoop.fs.ozone.contract.ITestOzoneContractGetFileStatus
> [INFO] Running org.apache.hadoop.fs.ozone.contract.ITestOzoneContractCreate
> [WARNING] Tests run: 11, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 
> 35.144 s - in org.apache.hadoop.fs.ozone.contract.ITestOzoneContractCreate
> [INFO] Running org.apache.hadoop.fs.ozone.contract.ITestOzoneContractRootDir
> [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.986 
> s - in org.apache.hadoop.fs.ozone.contract.ITestOzoneContractRootDir
> [INFO] 
> [INFO] Results:
> [INFO] 
> [WARNING] Tests run: 92, Failures: 0, Errors: 0, Skipped: 2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1925) ozonesecure acceptance test broken by HTTP auth requirement

2019-08-07 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1925:
--
Sprint: HDDS Biscayne

> ozonesecure acceptance test broken by HTTP auth requirement
> ---
>
> Key: HDDS-1925
> URL: https://issues.apache.org/jira/browse/HDDS-1925
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: docker, test
>Affects Versions: 0.4.1
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Critical
>
> Acceptance test is failing at {{ozonesecure}} with the following error from 
> {{jq}}:
> {noformat:title=https://github.com/elek/ozone-ci/blob/325779d34623061e27b80ade3b749210648086d1/byscane/byscane-nightly-ds7lx/acceptance/output.log#L2779}
> parse error: Invalid numeric literal at line 2, column 0
> {noformat}
> Example compose environments wait for datanodes to be up:
> {code:title=https://github.com/apache/hadoop/blob/9cd211ac86bb1124bdee572fddb6f86655b19b73/hadoop-ozone/dist/src/main/compose/testlib.sh#L71-L72}
>   docker-compose -f "$COMPOSE_FILE" up -d --scale datanode="${datanode_count}"
>   wait_for_datanodes "$COMPOSE_FILE" "${datanode_count}"
> {code}
> The number of datanodes up is determined via HTTP query of JMX endpoint:
> {code:title=https://github.com/apache/hadoop/blob/9cd211ac86bb1124bdee572fddb6f86655b19b73/hadoop-ozone/dist/src/main/compose/testlib.sh#L44-L46}
>  #This line checks the number of HEALTHY datanodes registered in scm over 
> the
>  # jmx HTTP servlet
>  datanodes=$(docker-compose -f "${compose_file}" exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
>  | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value')
> {code}
> The problem is that no authentication is performed before or during the 
> request, which is no longer allowed since HDDS-1901:
> {code}
> $ docker-compose exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
> 
> 
> 
> Error 401 Authentication required
> 
> HTTP ERROR 401
> Problem accessing /jmx. Reason:
> Authentication required
> 
> 
> {code}
> {code}
> $ docker-compose exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
>  | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value'
> parse error: Invalid numeric literal at line 2, column 0
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1888) Add containers to node2container map in SCM as soon as a container is created

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1888?focusedWorklogId=290465&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290465
 ]

ASF GitHub Bot logged work on HDDS-1888:


Author: ASF GitHub Bot
Created on: 07/Aug/19 13:47
Start Date: 07/Aug/19 13:47
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #1211: HDDS-1888. 
Add containers to node2container map in SCM as soon as a container is created.
URL: https://github.com/apache/hadoop/pull/1211#discussion_r311561367
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/states/NodeStateMap.java
 ##
 @@ -224,6 +224,23 @@ public NodeState getNodeState(UUID uuid) throws 
NodeNotFoundException {
 }
   }
 
+  /**
+   * Adds the given container to the specified datanode.
+   *
+   * @param uuid - datanode uuid
+   * @param containerId - containerID
+   * @throws NodeNotFoundException - if datanode is not known. For new datanode
+   *use addDatanodeInContainerMap call.
+   */
+  public void addContainer(final UUID uuid,
+   final ContainerID containerId)
+  throws NodeNotFoundException {
+if (!nodeToContainer.containsKey(uuid)) {
+  throw new NodeNotFoundException("Node UUID: " + uuid);
+}
+nodeToContainer.get(uuid).add(containerId);
 
 Review comment:
   the check and get calls can race with the remove call.
   
   Also looking into the code, there is no remove call, we should remove the dn 
once it is dead ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290465)
Time Spent: 40m  (was: 0.5h)

> Add containers to node2container map in SCM as soon as a container is created
> -
>
> Key: HDDS-1888
> URL: https://issues.apache.org/jira/browse/HDDS-1888
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In SCM node2container and node2pipeline maps are managed by NodeManager and 
> pipeline2container map is managed by PipelineManager.
> Currently, when a container is allocated in SCM, it is added to 
> pipeline2container map and we are not adding it to node2container map. We 
> update the node2container map only when the datanode sends full container 
> report.
> When a node is marked as dead, DeadNodeHandler processes the event and it 
> gets the list of containers that are hosted by the dead datanode and updates 
> the respective container replica state in ContainerManager. The list of 
> containers on the datanode is read from node2container map, this map will be 
> missing containers which are created recently (after the last container 
> report). In such cases we not be able to remove the container replica 
> information for those containers. In reality, these containers are under 
> replicated, but SCM will never know.
> We should add containers to node2container map in SCM as soon as a container 
> is allocated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1888) Add containers to node2container map in SCM as soon as a container is created

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1888?focusedWorklogId=290466&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290466
 ]

ASF GitHub Bot logged work on HDDS-1888:


Author: ASF GitHub Bot
Created on: 07/Aug/19 13:47
Start Date: 07/Aug/19 13:47
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #1211: HDDS-1888. 
Add containers to node2container map in SCM as soon as a container is created.
URL: https://github.com/apache/hadoop/pull/1211#discussion_r311556945
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/MockNodeManager.java
 ##
 @@ -267,6 +267,19 @@ public void removePipeline(Pipeline pipeline) {
 node2PipelineMap.removePipeline(pipeline);
   }
 
+  @Override
+  public void addContainer(DatanodeDetails dd,
+   ContainerID containerId)
+  throws NodeNotFoundException {
+try {
+  Set set = node2ContainerMap.getContainers(dd.getUuid());
+  set.add(containerId);
+  node2ContainerMap.setContainersForDatanode(dd.getUuid(), set);
+} catch (SCMException e) {
+  e.printStackTrace();
 
 Review comment:
   Is this needed ? can we just throw the exception or log the error.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290466)
Time Spent: 50m  (was: 40m)

> Add containers to node2container map in SCM as soon as a container is created
> -
>
> Key: HDDS-1888
> URL: https://issues.apache.org/jira/browse/HDDS-1888
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In SCM node2container and node2pipeline maps are managed by NodeManager and 
> pipeline2container map is managed by PipelineManager.
> Currently, when a container is allocated in SCM, it is added to 
> pipeline2container map and we are not adding it to node2container map. We 
> update the node2container map only when the datanode sends full container 
> report.
> When a node is marked as dead, DeadNodeHandler processes the event and it 
> gets the list of containers that are hosted by the dead datanode and updates 
> the respective container replica state in ContainerManager. The list of 
> containers on the datanode is read from node2container map, this map will be 
> missing containers which are created recently (after the last container 
> report). In such cases we not be able to remove the container replica 
> information for those containers. In reality, these containers are under 
> replicated, but SCM will never know.
> We should add containers to node2container map in SCM as soon as a container 
> is allocated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1888) Add containers to node2container map in SCM as soon as a container is created

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1888?focusedWorklogId=290472&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290472
 ]

ASF GitHub Bot logged work on HDDS-1888:


Author: ASF GitHub Bot
Created on: 07/Aug/19 13:51
Start Date: 07/Aug/19 13:51
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #1211: 
HDDS-1888. Add containers to node2container map in SCM as soon as a container 
is created.
URL: https://github.com/apache/hadoop/pull/1211#discussion_r311564536
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/states/NodeStateMap.java
 ##
 @@ -224,6 +224,23 @@ public NodeState getNodeState(UUID uuid) throws 
NodeNotFoundException {
 }
   }
 
+  /**
+   * Adds the given container to the specified datanode.
+   *
+   * @param uuid - datanode uuid
+   * @param containerId - containerID
+   * @throws NodeNotFoundException - if datanode is not known. For new datanode
+   *use addDatanodeInContainerMap call.
+   */
+  public void addContainer(final UUID uuid,
+   final ContainerID containerId)
+  throws NodeNotFoundException {
+if (!nodeToContainer.containsKey(uuid)) {
+  throw new NodeNotFoundException("Node UUID: " + uuid);
+}
+nodeToContainer.get(uuid).add(containerId);
 
 Review comment:
   For now we don't ever remove a datanode from NodeManager once it is 
registered. We should not remove dn once it is dead, we will not be able to 
show list of dead nodes to the user.
   Maybe as part of decommissioning once the decommission is successful we can 
remove the dn.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290472)
Time Spent: 1h  (was: 50m)

> Add containers to node2container map in SCM as soon as a container is created
> -
>
> Key: HDDS-1888
> URL: https://issues.apache.org/jira/browse/HDDS-1888
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In SCM node2container and node2pipeline maps are managed by NodeManager and 
> pipeline2container map is managed by PipelineManager.
> Currently, when a container is allocated in SCM, it is added to 
> pipeline2container map and we are not adding it to node2container map. We 
> update the node2container map only when the datanode sends full container 
> report.
> When a node is marked as dead, DeadNodeHandler processes the event and it 
> gets the list of containers that are hosted by the dead datanode and updates 
> the respective container replica state in ContainerManager. The list of 
> containers on the datanode is read from node2container map, this map will be 
> missing containers which are created recently (after the last container 
> report). In such cases we not be able to remove the container replica 
> information for those containers. In reality, these containers are under 
> replicated, but SCM will never know.
> We should add containers to node2container map in SCM as soon as a container 
> is allocated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14195) OIV: print out storage policy id in oiv Delimited output

2019-08-07 Thread Wang, Xinglong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Xinglong updated HDFS-14195:
--
Status: Open  (was: Patch Available)

> OIV: print out storage policy id in oiv Delimited output
> 
>
> Key: HDFS-14195
> URL: https://issues.apache.org/jira/browse/HDFS-14195
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Wang, Xinglong
>Assignee: Wang, Xinglong
>Priority: Minor
> Attachments: HDFS-14195.001.patch, HDFS-14195.002.patch, 
> HDFS-14195.003.patch, HDFS-14195.004.patch, HDFS-14195.005.patch, 
> HDFS-14195.006.patch, HDFS-14195.007.patch, HDFS-14195.008.patch, 
> HDFS-14195.009.patch
>
>
> There is lacking of a method to get all folders and files with sort of 
> specified storage policy via command line, like ALL_SSD type.
> By adding storage policy id to oiv output, it will help with oiv 
> post-analysis to have a overview of all folders/files with specified storage 
> policy and to apply internal regulation based on this information.
>  
> Currently, for PBImageXmlWriter.java, in HDFS-9835 it added function to print 
> out xattr which including storage policy already.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14195) OIV: print out storage policy id in oiv Delimited output

2019-08-07 Thread Wang, Xinglong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Xinglong updated HDFS-14195:
--
Attachment: HDFS-14195.009.patch
Status: Patch Available  (was: Open)

> OIV: print out storage policy id in oiv Delimited output
> 
>
> Key: HDFS-14195
> URL: https://issues.apache.org/jira/browse/HDFS-14195
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Wang, Xinglong
>Assignee: Wang, Xinglong
>Priority: Minor
> Attachments: HDFS-14195.001.patch, HDFS-14195.002.patch, 
> HDFS-14195.003.patch, HDFS-14195.004.patch, HDFS-14195.005.patch, 
> HDFS-14195.006.patch, HDFS-14195.007.patch, HDFS-14195.008.patch, 
> HDFS-14195.009.patch
>
>
> There is lacking of a method to get all folders and files with sort of 
> specified storage policy via command line, like ALL_SSD type.
> By adding storage policy id to oiv output, it will help with oiv 
> post-analysis to have a overview of all folders/files with specified storage 
> policy and to apply internal regulation based on this information.
>  
> Currently, for PBImageXmlWriter.java, in HDFS-9835 it added function to print 
> out xattr which including storage policy already.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14195) OIV: print out storage policy id in oiv Delimited output

2019-08-07 Thread Wang, Xinglong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902090#comment-16902090
 ] 

Wang, Xinglong commented on HDFS-14195:
---

Thank you [~jojochuang] and [~adam.antal] for the comments. I think the 
comments should be addressed now. Submitting new patch.

> OIV: print out storage policy id in oiv Delimited output
> 
>
> Key: HDFS-14195
> URL: https://issues.apache.org/jira/browse/HDFS-14195
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Wang, Xinglong
>Assignee: Wang, Xinglong
>Priority: Minor
> Attachments: HDFS-14195.001.patch, HDFS-14195.002.patch, 
> HDFS-14195.003.patch, HDFS-14195.004.patch, HDFS-14195.005.patch, 
> HDFS-14195.006.patch, HDFS-14195.007.patch, HDFS-14195.008.patch, 
> HDFS-14195.009.patch
>
>
> There is lacking of a method to get all folders and files with sort of 
> specified storage policy via command line, like ALL_SSD type.
> By adding storage policy id to oiv output, it will help with oiv 
> post-analysis to have a overview of all folders/files with specified storage 
> policy and to apply internal regulation based on this information.
>  
> Currently, for PBImageXmlWriter.java, in HDFS-9835 it added function to print 
> out xattr which including storage policy already.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1888) Add containers to node2container map in SCM as soon as a container is created

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1888?focusedWorklogId=290487&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290487
 ]

ASF GitHub Bot logged work on HDDS-1888:


Author: ASF GitHub Bot
Created on: 07/Aug/19 14:19
Start Date: 07/Aug/19 14:19
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #1211: 
HDDS-1888. Add containers to node2container map in SCM as soon as a container 
is created.
URL: https://github.com/apache/hadoop/pull/1211#discussion_r311579697
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/MockNodeManager.java
 ##
 @@ -267,6 +267,19 @@ public void removePipeline(Pipeline pipeline) {
 node2PipelineMap.removePipeline(pipeline);
   }
 
+  @Override
+  public void addContainer(DatanodeDetails dd,
+   ContainerID containerId)
+  throws NodeNotFoundException {
+try {
+  Set set = node2ContainerMap.getContainers(dd.getUuid());
+  set.add(containerId);
+  node2ContainerMap.setContainersForDatanode(dd.getUuid(), set);
+} catch (SCMException e) {
+  e.printStackTrace();
 
 Review comment:
   We can only throw `NodeNotFoundException` or any runtime exception.
   > log the error.
   `printStackTrace` was added to log the error :) We don't have any logger 
defined in MockNodeManager.
   
   The test-case using `MockNodeManager` will anyways depend on the existence 
of container after calling addContainer, and the test should eventually fail if 
this call fails.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290487)
Time Spent: 1h 10m  (was: 1h)

> Add containers to node2container map in SCM as soon as a container is created
> -
>
> Key: HDDS-1888
> URL: https://issues.apache.org/jira/browse/HDDS-1888
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In SCM node2container and node2pipeline maps are managed by NodeManager and 
> pipeline2container map is managed by PipelineManager.
> Currently, when a container is allocated in SCM, it is added to 
> pipeline2container map and we are not adding it to node2container map. We 
> update the node2container map only when the datanode sends full container 
> report.
> When a node is marked as dead, DeadNodeHandler processes the event and it 
> gets the list of containers that are hosted by the dead datanode and updates 
> the respective container replica state in ContainerManager. The list of 
> containers on the datanode is read from node2container map, this map will be 
> missing containers which are created recently (after the last container 
> report). In such cases we not be able to remove the container replica 
> information for those containers. In reality, these containers are under 
> replicated, but SCM will never know.
> We should add containers to node2container map in SCM as soon as a container 
> is allocated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1888) Add containers to node2container map in SCM as soon as a container is created

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1888?focusedWorklogId=290488&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290488
 ]

ASF GitHub Bot logged work on HDDS-1888:


Author: ASF GitHub Bot
Created on: 07/Aug/19 14:19
Start Date: 07/Aug/19 14:19
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #1211: 
HDDS-1888. Add containers to node2container map in SCM as soon as a container 
is created.
URL: https://github.com/apache/hadoop/pull/1211#discussion_r311579697
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/MockNodeManager.java
 ##
 @@ -267,6 +267,19 @@ public void removePipeline(Pipeline pipeline) {
 node2PipelineMap.removePipeline(pipeline);
   }
 
+  @Override
+  public void addContainer(DatanodeDetails dd,
+   ContainerID containerId)
+  throws NodeNotFoundException {
+try {
+  Set set = node2ContainerMap.getContainers(dd.getUuid());
+  set.add(containerId);
+  node2ContainerMap.setContainersForDatanode(dd.getUuid(), set);
+} catch (SCMException e) {
+  e.printStackTrace();
 
 Review comment:
   We can only throw `NodeNotFoundException` or any runtime exception.
   > log the error.
   
   `printStackTrace` was added to log the error :) We don't have any logger 
defined in MockNodeManager.
   
   The test-case using `MockNodeManager` will anyways depend on the existence 
of container after calling addContainer, and the test should eventually fail if 
this call fails.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290488)
Time Spent: 1h 20m  (was: 1h 10m)

> Add containers to node2container map in SCM as soon as a container is created
> -
>
> Key: HDDS-1888
> URL: https://issues.apache.org/jira/browse/HDDS-1888
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In SCM node2container and node2pipeline maps are managed by NodeManager and 
> pipeline2container map is managed by PipelineManager.
> Currently, when a container is allocated in SCM, it is added to 
> pipeline2container map and we are not adding it to node2container map. We 
> update the node2container map only when the datanode sends full container 
> report.
> When a node is marked as dead, DeadNodeHandler processes the event and it 
> gets the list of containers that are hosted by the dead datanode and updates 
> the respective container replica state in ContainerManager. The list of 
> containers on the datanode is read from node2container map, this map will be 
> missing containers which are created recently (after the last container 
> report). In such cases we not be able to remove the container replica 
> information for those containers. In reality, these containers are under 
> replicated, but SCM will never know.
> We should add containers to node2container map in SCM as soon as a container 
> is allocated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902115#comment-16902115
 ] 

Arpit Agarwal commented on HDDS-1926:
-

[~bharatviswa] can you take a look at this?

> The new caching layer is used for old OM requests but not updated
> -
>
> Key: HDDS-1926
> URL: https://issues.apache.org/jira/browse/HDDS-1926
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Elek, Marton
>Priority: Blocker
>
> HDDS-1499 introduced a new caching layer together with a double-buffer based 
> db writer to support OM HA.
> TLDR: I think the caching layer is not updated for new volume creation. And 
> (slightly related to this problem) I suggest to separated the TypedTable and 
> the caching layer.
> ## How to reproduce the problem?
> 1. Start a docker compose cluster
> 2. Create one volume (let's say `/vol1`)
> 3. Restart the om (!)
> 4. Try to create an _other_ volume twice!
> ```
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> ```
> Expected behavior is an error:
> {code}
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> VOLUME_ALREADY_EXISTS 
> {code}
> The problem is that the new cache is used even for the old code path 
> (TypedTable):
> {code}
>  @Override
>   public VALUE get(KEY key) throws IOException {
> // Here the metadata lock will guarantee that cache is not updated for 
> same
> // key during get key.
> CacheResult> cacheResult =
> cache.lookup(new CacheKey<>(key));
> if (cacheResult.getCacheStatus() == EXISTS) {
>   return cacheResult.getValue().getCacheValue();
> } else if (cacheResult.getCacheStatus() == NOT_EXIST) {
>   return null;
> } else {
>   return getFromTable(key);
> }
>   }
> {code}
> For volume table after the FIRST start it always returns with 
> `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:
> {code}
>   public CacheResult lookup(CACHEKEY cachekey) {
> if (cache.size() == 0) {
>   return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
>   null);
> }
> {code}
> But after a restart the cache is pre-loaded by the TypedTable.constructor. 
> After the restart, the real caching logic will be used (as cache.size()>0), 
> which cause a problem as the cache is NOT updated from the old code path.
> An additional problem is that the cache is turned on for all the metadata 
> table even if the cache is not required... 
> ## Proposed solution
> As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
> It's not updated during the typedTable.put() call but updated by a separated 
> component during double-buffer flash.
> I would suggest to remove the cache related methods from TypedTable (move to 
> a separated implementation). I think this kind of caching can be independent 
> from the TypedTable implementation. We can continue to use the simple 
> TypedTable everywhere where we don't need to use any kind of caching.
> For caching we can use a separated object. It would make it more visible that 
> the cache should always be updated manually all the time. This separated 
> caching utility may include a reference to the original TypedTable/Table. 
> With this approach we can separate the different responsibilities but provide 
> the same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2019-08-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902121#comment-16902121
 ] 

Steve Loughran commented on HDFS-9924:
--

no asynv IO there. output streams arent required to be thread safe in java APIs

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Duo Zhang
>Priority: Major
> Attachments: Async-HDFS-Performance-Report.pdf, 
> AsyncHdfs20160510.pdf, HDFS-9924-POC.patch
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313-branch-2.v2.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313-branch-2.v1.patch, 
> HDFS-14313-branch-2.v2.patch, HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902141#comment-16902141
 ] 

Hadoop QA commented on HDFS-14313:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HDFS-14313 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14313 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976946/HDFS-14313-branch-2.v2.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27440/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313-branch-2.v1.patch, 
> HDFS-14313-branch-2.v2.patch, HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902154#comment-16902154
 ] 

Lisheng Sun commented on HDFS-14313:


the branch-2.v2 for patch fixes unchecked issue in javac. It can be ignored.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313-branch-2.v1.patch, 
> HDFS-14313-branch-2.v2.patch, HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Yiqun Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-14313:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.4
   2.10.0
   Status: Resolved  (was: Patch Available)

Thanks [~leosun08] for providing the patches. I have just committed the patch 
for branch-2 and branch-3.0.

Thanks [~leosun08] for the contribution and thanks [~jojochuang] for additional 
review.

Feel free to backport to other branches if we want to brackport them.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0
>
> Attachments: HDFS-14313-branch-2.v1.patch, 
> HDFS-14313-branch-2.v2.patch, HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14631) The DirectoryScanner doesn't fix the wrongly placed replica.

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902178#comment-16902178
 ] 

Hadoop QA commented on HDFS-14631:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.9 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
14s{color} | {color:green} branch-2.9 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} branch-2.9 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} branch-2.9 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} branch-2.9 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} branch-2.9 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} branch-2.9 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} branch-2.9 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} branch-2.9 passed with JDK v1.8.0_212 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}100m  9s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestFileCreation |
|   | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.TestFileCorruption |
|   | hadoop.hdfs.TestSafeMode |
|   | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.TestDatanodeLayoutUpgrade |
|   | hadoop.hdfs.TestGetBlocks |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
|   | hadoop.hdfs.TestEncryptedTransfer |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation |
|   | hadoop.hdfs.TestFileCreationDelete |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:c3439fff6be |
| JIRA Issue | HDFS-14631 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secu

[jira] [Commented] (HDFS-14195) OIV: print out storage policy id in oiv Delimited output

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902202#comment-16902202
 ] 

Hadoop QA commented on HDFS-14195:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 136 unchanged - 0 fixed = 138 total (was 136) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m  3s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.diskbalancer.TestDiskBalancer |
|   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
|   | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
|   | hadoop.hdfs.server.balancer.TestBalancer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14195 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976945/HDFS-14195.009.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0d5cd04aa881 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 70f4674 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27439/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27439/artifact/out/patch-unit-hadoop-hdfs-projec

[jira] [Commented] (HDFS-14662) Document the usage of the new Balancer "asService" parameter

2019-08-07 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902203#comment-16902203
 ] 

Erik Krogen commented on HDFS-14662:


+1 from me, sorry for the delay! Thanks [~zhangchen]

> Document the usage of the new Balancer "asService" parameter
> 
>
> Key: HDFS-14662
> URL: https://issues.apache.org/jira/browse/HDFS-14662
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14662.001.patch, HDFS-14662.002.patch, 
> HDFS-14662.003.patch
>
>
> see HDFS-13783, this jira add document for how to run balancer as a long 
> service



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14631) The DirectoryScanner doesn't fix the wrongly placed replica.

2019-08-07 Thread Erik Krogen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14631:
---
Fix Version/s: (was: 2.9.3)

> The DirectoryScanner doesn't fix the wrongly placed replica.
> 
>
> Key: HDFS-14631
> URL: https://issues.apache.org/jira/browse/HDFS-14631
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14631-branch-2.9.001.patch, HDFS-14631.001.patch, 
> HDFS-14631.002.patch, HDFS-14631.003.patch, HDFS-14631.004.patch
>
>
> When DirectoryScanner scans block files, if the block refers to the block 
> file does not exist the DirectoryScanner will update the block based on the 
> replica file found on the disk. See FsDatasetImpl#checkAndUpdate.
>  
> {code:java}
> /*
> * Block exists in volumeMap and the block file exists on the disk
> */
> // Compare block files
> if (memBlockInfo.blockDataExists()) {
>   ...
> } else {
>   // Block refers to a block file that does not exist.
>   // Update the block with the file found on the disk. Since the block
>   // file and metadata file are found as a pair on the disk, update
>   // the block based on the metadata file found on the disk
>   LOG.warn("Block file in replica "
>   + memBlockInfo.getBlockURI()
>   + " does not exist. Updating it to the file found during scan "
>   + diskFile.getAbsolutePath());
>   memBlockInfo.updateWithReplica(
>   StorageLocation.parse(diskFile.toString()));
>   LOG.warn("Updating generation stamp for block " + blockId
>   + " from " + memBlockInfo.getGenerationStamp() + " to " + diskGS);
>   memBlockInfo.setGenerationStamp(diskGS);
> }
> {code}
> But the DirectoryScanner doesn't really fix it because in 
> LocalReplica#parseBaseDir() the 'subdir' are ignored.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-1924:
---
   Resolution: Fixed
Fix Version/s: 0.4.1
   Status: Resolved  (was: Patch Available)

> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation, Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1924?focusedWorklogId=290571&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290571
 ]

ASF GitHub Bot logged work on HDDS-1924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 16:25
Start Date: 07/Aug/19 16:25
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #1245: HDDS-1924. ozone 
sh bucket path command does not exist
URL: https://github.com/apache/hadoop/pull/1245
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290571)
Time Spent: 40m  (was: 0.5h)

> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation, Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14370) Edit log tailing fast-path should allow for backoff

2019-08-07 Thread Erik Krogen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14370:
---
Attachment: HDFS-14370.005.patch

> Edit log tailing fast-path should allow for backoff
> ---
>
> Key: HDFS-14370
> URL: https://issues.apache.org/jira/browse/HDFS-14370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, qjm
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14370.000.patch, HDFS-14370.001.patch, 
> HDFS-14370.002.patch, HDFS-14370.003.patch, HDFS-14370.004.patch, 
> HDFS-14370.005.patch
>
>
> As part of HDFS-13150, in-progress edit log tailing was changed to use an 
> RPC-based mechanism, thus allowing the edit log tailing frequency to be 
> turned way down, and allowing standby/observer NameNodes to be only a few 
> milliseconds stale as compared to the Active NameNode.
> When there is a high volume of transactions on the system, each RPC fetches 
> transactions and takes some time to process them, self-rate-limiting how 
> frequently an RPC is submitted. In a lightly loaded cluster, however, most of 
> these RPCs return an empty set of transactions, consuming a high 
> (de)serialization overhead for very little benefit. This was reported by 
> [~jojochuang] in HDFS-14276 and I have also seen it on a test cluster where 
> the SbNN was submitting 8000 RPCs per second that returned empty.
> I propose we add some sort of backoff to the tailing, so that if an empty 
> response is received, it will wait a longer period of time before submitting 
> a new RPC.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1924) ozone sh bucket path command does not exist

2019-08-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902218#comment-16902218
 ] 

Hudson commented on HDDS-1924:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17058 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17058/])
HDDS-1924. ozone sh bucket path command does not exist (elek: rev 
0520f5cedee0565a342a12a787ff9737f34691b1)
* (edit) hadoop-hdds/docs/content/interface/S3.md


> ozone sh bucket path command does not exist
> ---
>
> Key: HDDS-1924
> URL: https://issues.apache.org/jira/browse/HDDS-1924
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: documentation, Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Doroszlai, Attila
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> ozone sh bucket path command does not exist but it is mentioned in the 
> static/docs/interface/s3.html. The command should either be added back or a 
> the documentation should be improved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14370) Edit log tailing fast-path should allow for backoff

2019-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902220#comment-16902220
 ] 

Hadoop QA commented on HDFS-14370:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-14370 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14370 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976955/HDFS-14370.005.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27441/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Edit log tailing fast-path should allow for backoff
> ---
>
> Key: HDFS-14370
> URL: https://issues.apache.org/jira/browse/HDFS-14370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, qjm
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14370.000.patch, HDFS-14370.001.patch, 
> HDFS-14370.002.patch, HDFS-14370.003.patch, HDFS-14370.004.patch, 
> HDFS-14370.005.patch
>
>
> As part of HDFS-13150, in-progress edit log tailing was changed to use an 
> RPC-based mechanism, thus allowing the edit log tailing frequency to be 
> turned way down, and allowing standby/observer NameNodes to be only a few 
> milliseconds stale as compared to the Active NameNode.
> When there is a high volume of transactions on the system, each RPC fetches 
> transactions and takes some time to process them, self-rate-limiting how 
> frequently an RPC is submitted. In a lightly loaded cluster, however, most of 
> these RPCs return an empty set of transactions, consuming a high 
> (de)serialization overhead for very little benefit. This was reported by 
> [~jojochuang] in HDFS-14276 and I have also seen it on a test cluster where 
> the SbNN was submitting 8000 RPCs per second that returned empty.
> I propose we add some sort of backoff to the tailing, so that if an empty 
> response is received, it will wait a longer period of time before submitting 
> a new RPC.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14370) Edit log tailing fast-path should allow for backoff

2019-08-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902219#comment-16902219
 ] 

Hudson commented on HDFS-14370:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17058 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17058/])
HDFS-14370. Add exponential backoff to the edit log tailer to avoid (xkrogen: 
rev 827dbb11e24be294b40088a8aa46086ba8ca4ba8)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ObserverNameNode.md
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


> Edit log tailing fast-path should allow for backoff
> ---
>
> Key: HDFS-14370
> URL: https://issues.apache.org/jira/browse/HDFS-14370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, qjm
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14370.000.patch, HDFS-14370.001.patch, 
> HDFS-14370.002.patch, HDFS-14370.003.patch, HDFS-14370.004.patch, 
> HDFS-14370.005.patch
>
>
> As part of HDFS-13150, in-progress edit log tailing was changed to use an 
> RPC-based mechanism, thus allowing the edit log tailing frequency to be 
> turned way down, and allowing standby/observer NameNodes to be only a few 
> milliseconds stale as compared to the Active NameNode.
> When there is a high volume of transactions on the system, each RPC fetches 
> transactions and takes some time to process them, self-rate-limiting how 
> frequently an RPC is submitted. In a lightly loaded cluster, however, most of 
> these RPCs return an empty set of transactions, consuming a high 
> (de)serialization overhead for very little benefit. This was reported by 
> [~jojochuang] in HDFS-14276 and I have also seen it on a test cluster where 
> the SbNN was submitting 8000 RPCs per second that returned empty.
> I propose we add some sort of backoff to the tailing, so that if an empty 
> response is received, it will wait a longer period of time before submitting 
> a new RPC.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14370) Edit log tailing fast-path should allow for backoff

2019-08-07 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902228#comment-16902228
 ] 

Erik Krogen commented on HDFS-14370:


Thanks [~ayushtkn]. Addressed this last comment in v005 and went ahead to 
commit based on the three +1s in this thread. I committed to trunk and 
backported to the 3.x line. I think this needs to land in branch-2 as well once 
HDFS-14204 is completed, so I will leave this open for now.

> Edit log tailing fast-path should allow for backoff
> ---
>
> Key: HDFS-14370
> URL: https://issues.apache.org/jira/browse/HDFS-14370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, qjm
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14370.000.patch, HDFS-14370.001.patch, 
> HDFS-14370.002.patch, HDFS-14370.003.patch, HDFS-14370.004.patch, 
> HDFS-14370.005.patch
>
>
> As part of HDFS-13150, in-progress edit log tailing was changed to use an 
> RPC-based mechanism, thus allowing the edit log tailing frequency to be 
> turned way down, and allowing standby/observer NameNodes to be only a few 
> milliseconds stale as compared to the Active NameNode.
> When there is a high volume of transactions on the system, each RPC fetches 
> transactions and takes some time to process them, self-rate-limiting how 
> frequently an RPC is submitted. In a lightly loaded cluster, however, most of 
> these RPCs return an empty set of transactions, consuming a high 
> (de)serialization overhead for very little benefit. This was reported by 
> [~jojochuang] in HDFS-14276 and I have also seen it on a test cluster where 
> the SbNN was submitting 8000 RPCs per second that returned empty.
> I propose we add some sort of backoff to the tailing, so that if an empty 
> response is received, it will wait a longer period of time before submitting 
> a new RPC.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14370) Edit log tailing fast-path should allow for backoff

2019-08-07 Thread Erik Krogen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14370:
---
Fix Version/s: 3.1.3
   3.2.1
   3.3.0
   3.0.4

> Edit log tailing fast-path should allow for backoff
> ---
>
> Key: HDFS-14370
> URL: https://issues.apache.org/jira/browse/HDFS-14370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, qjm
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14370.000.patch, HDFS-14370.001.patch, 
> HDFS-14370.002.patch, HDFS-14370.003.patch, HDFS-14370.004.patch, 
> HDFS-14370.005.patch
>
>
> As part of HDFS-13150, in-progress edit log tailing was changed to use an 
> RPC-based mechanism, thus allowing the edit log tailing frequency to be 
> turned way down, and allowing standby/observer NameNodes to be only a few 
> milliseconds stale as compared to the Active NameNode.
> When there is a high volume of transactions on the system, each RPC fetches 
> transactions and takes some time to process them, self-rate-limiting how 
> frequently an RPC is submitted. In a lightly loaded cluster, however, most of 
> these RPCs return an empty set of transactions, consuming a high 
> (de)serialization overhead for very little benefit. This was reported by 
> [~jojochuang] in HDFS-14276 and I have also seen it on a test cluster where 
> the SbNN was submitting 8000 RPCs per second that returned empty.
> I propose we add some sort of backoff to the tailing, so that if an empty 
> response is received, it will wait a longer period of time before submitting 
> a new RPC.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-14370) Edit log tailing fast-path should allow for backoff

2019-08-07 Thread Erik Krogen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14370:
---
Comment: was deleted

(was: | (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-14370 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14370 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976955/HDFS-14370.005.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27441/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.

)

> Edit log tailing fast-path should allow for backoff
> ---
>
> Key: HDFS-14370
> URL: https://issues.apache.org/jira/browse/HDFS-14370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode, qjm
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14370.000.patch, HDFS-14370.001.patch, 
> HDFS-14370.002.patch, HDFS-14370.003.patch, HDFS-14370.004.patch, 
> HDFS-14370.005.patch
>
>
> As part of HDFS-13150, in-progress edit log tailing was changed to use an 
> RPC-based mechanism, thus allowing the edit log tailing frequency to be 
> turned way down, and allowing standby/observer NameNodes to be only a few 
> milliseconds stale as compared to the Active NameNode.
> When there is a high volume of transactions on the system, each RPC fetches 
> transactions and takes some time to process them, self-rate-limiting how 
> frequently an RPC is submitted. In a lightly loaded cluster, however, most of 
> these RPCs return an empty set of transactions, consuming a high 
> (de)serialization overhead for very little benefit. This was reported by 
> [~jojochuang] in HDFS-14276 and I have also seen it on a test cluster where 
> the SbNN was submitting 8000 RPCs per second that returned empty.
> I propose we add some sort of backoff to the tailing, so that if an empty 
> response is received, it will wait a longer period of time before submitting 
> a new RPC.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1888) Add containers to node2container map in SCM as soon as a container is created

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1888?focusedWorklogId=290588&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290588
 ]

ASF GitHub Bot logged work on HDDS-1888:


Author: ASF GitHub Bot
Created on: 07/Aug/19 16:36
Start Date: 07/Aug/19 16:36
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1211: HDDS-1888. Add 
containers to node2container map in SCM as soon as a container is created.
URL: https://github.com/apache/hadoop/pull/1211#issuecomment-519174961
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 48 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 3 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 607 | trunk passed |
   | +1 | compile | 370 | trunk passed |
   | +1 | checkstyle | 76 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 851 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 162 | trunk passed |
   | 0 | spotbugs | 421 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 616 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 564 | the patch passed |
   | +1 | compile | 372 | the patch passed |
   | +1 | javac | 372 | the patch passed |
   | +1 | checkstyle | 70 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 656 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 166 | the patch passed |
   | +1 | findbugs | 631 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 303 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2105 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 53 | The patch does not generate ASF License warnings. |
   | | | 7806 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.client.rpc.TestWatchForCommit |
   |   | hadoop.ozone.om.TestScmSafeMode |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1211/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1211 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 952133fae2bc 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 70f4674 |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1211/3/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1211/3/testReport/ |
   | Max. process+thread count | 5009 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/server-scm U: hadoop-hdds/server-scm |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1211/3/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290588)
Time Spent: 1.5h  (was: 1h 20m)

> Add containers to node2container map in SCM as soon as a container is created
> -
>
> Key: HDDS-1888
> URL: https://issues.apache.org/jira/browse/HDDS-1888
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1

[jira] [Commented] (HDFS-14608) DataNode$DataTransfer should be named

2019-08-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902244#comment-16902244
 ] 

Íñigo Goiri commented on HDFS-14608:


Thanks [~ayushtkn], this looks better now.
I haven't seen fail TestLargeBlockReport in the past that it does not look like 
is related.

> DataNode$DataTransfer should be named
> -
>
> Key: HDFS-14608
> URL: https://issues.apache.org/jira/browse/HDFS-14608
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14608.000.patch, HDFS-14608.001.patch
>
>
> Currently, the {{DataTransfer}} thread has no name and it just outputs the 
> default {{toString()}}.
> This shows in the logs in jstack as something like:
> {code}
> 2019-06-25 11:01:01,211 INFO 
> [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@609ed67a] 
> org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at 
> CO4AEAPC1AF:10010: Transmitted 
> BP-1191059133-10.1.2.3-145702348:blk_1113379522_69745835 
> (numBytes=485214) to 10.1.2.3/10.1.2.3:10010
> {code}
> As this uses the {{Daemon}} class, the name is set based on:
> {code}
>   public Daemon(Runnable runnable) {
> super(runnable);
> this.runnable = runnable;
> this.setName(((Object)runnable).toString());
>   }
> {code}
> We should implement toString to at least have the name of the block being 
> transfferred or something similar to what DataXceiver does (e.g., HDFS-3375).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13603) Warmup NameNode EDEK thread retries continuously if there's an invalid key

2019-08-07 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902248#comment-16902248
 ] 

Wei-Chiu Chuang commented on HDFS-13603:


We see this in our internal tests as well. I might spend some time to dig into 
this when I get a chance.

> Warmup NameNode EDEK thread retries continuously if there's an invalid key 
> ---
>
> Key: HDFS-13603
> URL: https://issues.apache.org/jira/browse/HDFS-13603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.8.0
>Reporter: Antony Jay
>Priority: Major
>
> https://issues.apache.org/jira/browse/HDFS-9405 adds a background thread to 
> pre-warm EDEK cache. 
> However this fails and retries continuously if key retrieval fails for one 
> encryption zone. In our usecase, we have temporarily removed keys for certain 
> encryption zones.  Currently namenode and kms log is filled up with errors 
> related to background thread retrying warmup for ever .
> The pre-warm thread should
>  * Continue to refresh other encryption zones even if it fails for one
>  * Should retry only if it fails for all encryption zones, which will be the 
> case when kms is down.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham reassigned HDDS-1926:


Assignee: Bharat Viswanadham

> The new caching layer is used for old OM requests but not updated
> -
>
> Key: HDDS-1926
> URL: https://issues.apache.org/jira/browse/HDDS-1926
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Elek, Marton
>Assignee: Bharat Viswanadham
>Priority: Blocker
>
> HDDS-1499 introduced a new caching layer together with a double-buffer based 
> db writer to support OM HA.
> TLDR: I think the caching layer is not updated for new volume creation. And 
> (slightly related to this problem) I suggest to separated the TypedTable and 
> the caching layer.
> ## How to reproduce the problem?
> 1. Start a docker compose cluster
> 2. Create one volume (let's say `/vol1`)
> 3. Restart the om (!)
> 4. Try to create an _other_ volume twice!
> ```
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> ```
> Expected behavior is an error:
> {code}
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> VOLUME_ALREADY_EXISTS 
> {code}
> The problem is that the new cache is used even for the old code path 
> (TypedTable):
> {code}
>  @Override
>   public VALUE get(KEY key) throws IOException {
> // Here the metadata lock will guarantee that cache is not updated for 
> same
> // key during get key.
> CacheResult> cacheResult =
> cache.lookup(new CacheKey<>(key));
> if (cacheResult.getCacheStatus() == EXISTS) {
>   return cacheResult.getValue().getCacheValue();
> } else if (cacheResult.getCacheStatus() == NOT_EXIST) {
>   return null;
> } else {
>   return getFromTable(key);
> }
>   }
> {code}
> For volume table after the FIRST start it always returns with 
> `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:
> {code}
>   public CacheResult lookup(CACHEKEY cachekey) {
> if (cache.size() == 0) {
>   return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
>   null);
> }
> {code}
> But after a restart the cache is pre-loaded by the TypedTable.constructor. 
> After the restart, the real caching logic will be used (as cache.size()>0), 
> which cause a problem as the cache is NOT updated from the old code path.
> An additional problem is that the cache is turned on for all the metadata 
> table even if the cache is not required... 
> ## Proposed solution
> As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
> It's not updated during the typedTable.put() call but updated by a separated 
> component during double-buffer flash.
> I would suggest to remove the cache related methods from TypedTable (move to 
> a separated implementation). I think this kind of caching can be independent 
> from the TypedTable implementation. We can continue to use the simple 
> TypedTable everywhere where we don't need to use any kind of caching.
> For caching we can use a separated object. It would make it more visible that 
> the cache should always be updated manually all the time. This separated 
> caching utility may include a reference to the original TypedTable/Table. 
> With this approach we can separate the different responsibilities but provide 
> the same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902251#comment-16902251
 ] 

Bharat Viswanadham commented on HDDS-1926:
--

As discussed offline with [~arp] and [~elek]

We shall use ratisEnabled and define cache policy to bucket and volume table.

 

And another discussion we had is We can have CachedTypedTable which extends 
Table and overload put to take transactionIndex.

> The new caching layer is used for old OM requests but not updated
> -
>
> Key: HDDS-1926
> URL: https://issues.apache.org/jira/browse/HDDS-1926
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Elek, Marton
>Assignee: Bharat Viswanadham
>Priority: Blocker
>
> HDDS-1499 introduced a new caching layer together with a double-buffer based 
> db writer to support OM HA.
> TLDR: I think the caching layer is not updated for new volume creation. And 
> (slightly related to this problem) I suggest to separated the TypedTable and 
> the caching layer.
> ## How to reproduce the problem?
> 1. Start a docker compose cluster
> 2. Create one volume (let's say `/vol1`)
> 3. Restart the om (!)
> 4. Try to create an _other_ volume twice!
> ```
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> ```
> Expected behavior is an error:
> {code}
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> VOLUME_ALREADY_EXISTS 
> {code}
> The problem is that the new cache is used even for the old code path 
> (TypedTable):
> {code}
>  @Override
>   public VALUE get(KEY key) throws IOException {
> // Here the metadata lock will guarantee that cache is not updated for 
> same
> // key during get key.
> CacheResult> cacheResult =
> cache.lookup(new CacheKey<>(key));
> if (cacheResult.getCacheStatus() == EXISTS) {
>   return cacheResult.getValue().getCacheValue();
> } else if (cacheResult.getCacheStatus() == NOT_EXIST) {
>   return null;
> } else {
>   return getFromTable(key);
> }
>   }
> {code}
> For volume table after the FIRST start it always returns with 
> `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:
> {code}
>   public CacheResult lookup(CACHEKEY cachekey) {
> if (cache.size() == 0) {
>   return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
>   null);
> }
> {code}
> But after a restart the cache is pre-loaded by the TypedTable.constructor. 
> After the restart, the real caching logic will be used (as cache.size()>0), 
> which cause a problem as the cache is NOT updated from the old code path.
> An additional problem is that the cache is turned on for all the metadata 
> table even if the cache is not required... 
> ## Proposed solution
> As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
> It's not updated during the typedTable.put() call but updated by a separated 
> component during double-buffer flash.
> I would suggest to remove the cache related methods from TypedTable (move to 
> a separated implementation). I think this kind of caching can be independent 
> from the TypedTable implementation. We can continue to use the simple 
> TypedTable everywhere where we don't need to use any kind of caching.
> For caching we can use a separated object. It would make it more visible that 
> the cache should always be updated manually all the time. This separated 
> caching utility may include a reference to the original TypedTable/Table. 
> With this approach we can separate the different responsibilities but provide 
> the same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1925) ozonesecure acceptance test broken by HTTP auth requirement

2019-08-07 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902253#comment-16902253
 ] 

Xiaoyu Yao commented on HDDS-1925:
--

Thanks [~adoroszlai] for reporting the issue. Let's comment out the http 
authentication configurations in the original secure docker compose from 
HDDS-1901 and create a new one for secure docker compose with http 
authentication. 

> ozonesecure acceptance test broken by HTTP auth requirement
> ---
>
> Key: HDDS-1925
> URL: https://issues.apache.org/jira/browse/HDDS-1925
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: docker, test
>Affects Versions: 0.4.1
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Critical
>
> Acceptance test is failing at {{ozonesecure}} with the following error from 
> {{jq}}:
> {noformat:title=https://github.com/elek/ozone-ci/blob/325779d34623061e27b80ade3b749210648086d1/byscane/byscane-nightly-ds7lx/acceptance/output.log#L2779}
> parse error: Invalid numeric literal at line 2, column 0
> {noformat}
> Example compose environments wait for datanodes to be up:
> {code:title=https://github.com/apache/hadoop/blob/9cd211ac86bb1124bdee572fddb6f86655b19b73/hadoop-ozone/dist/src/main/compose/testlib.sh#L71-L72}
>   docker-compose -f "$COMPOSE_FILE" up -d --scale datanode="${datanode_count}"
>   wait_for_datanodes "$COMPOSE_FILE" "${datanode_count}"
> {code}
> The number of datanodes up is determined via HTTP query of JMX endpoint:
> {code:title=https://github.com/apache/hadoop/blob/9cd211ac86bb1124bdee572fddb6f86655b19b73/hadoop-ozone/dist/src/main/compose/testlib.sh#L44-L46}
>  #This line checks the number of HEALTHY datanodes registered in scm over 
> the
>  # jmx HTTP servlet
>  datanodes=$(docker-compose -f "${compose_file}" exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
>  | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value')
> {code}
> The problem is that no authentication is performed before or during the 
> request, which is no longer allowed since HDDS-1901:
> {code}
> $ docker-compose exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
> 
> 
> 
> Error 401 Authentication required
> 
> HTTP ERROR 401
> Problem accessing /jmx. Reason:
> Authentication required
> 
> 
> {code}
> {code}
> $ docker-compose exec -T scm curl -s 
> 'http://localhost:9876/jmx?qry=Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo'
>  | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value'
> parse error: Invalid numeric literal at line 2, column 0
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-07 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902330#comment-16902330
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


[~shashikant], great work on the patch!  Could you fix the checkstyle warnings 
and see if the unit test failures are related?

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1926?focusedWorklogId=290617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290617
 ]

ASF GitHub Bot logged work on HDDS-1926:


Author: ASF GitHub Bot
Created on: 07/Aug/19 17:26
Start Date: 07/Aug/19 17:26
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #1247: 
HDDS-1926. The new caching layer is used for old OM requests but not updated.
URL: https://github.com/apache/hadoop/pull/1247
 
 
   When I am trying to add a test with restart manager, and then try to create 
the same volume, I am getting some NetUtils EOF exception. Will try to see why 
it is causing, if I am able to solve this issue, will post an Integration test 
for the scenario mentioned in the Jira.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290617)
Time Spent: 10m
Remaining Estimate: 0h

> The new caching layer is used for old OM requests but not updated
> -
>
> Key: HDDS-1926
> URL: https://issues.apache.org/jira/browse/HDDS-1926
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Elek, Marton
>Assignee: Bharat Viswanadham
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDDS-1499 introduced a new caching layer together with a double-buffer based 
> db writer to support OM HA.
> TLDR: I think the caching layer is not updated for new volume creation. And 
> (slightly related to this problem) I suggest to separated the TypedTable and 
> the caching layer.
> ## How to reproduce the problem?
> 1. Start a docker compose cluster
> 2. Create one volume (let's say `/vol1`)
> 3. Restart the om (!)
> 4. Try to create an _other_ volume twice!
> ```
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> ```
> Expected behavior is an error:
> {code}
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> VOLUME_ALREADY_EXISTS 
> {code}
> The problem is that the new cache is used even for the old code path 
> (TypedTable):
> {code}
>  @Override
>   public VALUE get(KEY key) throws IOException {
> // Here the metadata lock will guarantee that cache is not updated for 
> same
> // key during get key.
> CacheResult> cacheResult =
> cache.lookup(new CacheKey<>(key));
> if (cacheResult.getCacheStatus() == EXISTS) {
>   return cacheResult.getValue().getCacheValue();
> } else if (cacheResult.getCacheStatus() == NOT_EXIST) {
>   return null;
> } else {
>   return getFromTable(key);
> }
>   }
> {code}
> For volume table after the FIRST start it always returns with 
> `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:
> {code}
>   public CacheResult lookup(CACHEKEY cachekey) {
> if (cache.size() == 0) {
>   return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
>   null);
> }
> {code}
> But after a restart the cache is pre-loaded by the TypedTable.constructor. 
> After the restart, the real caching logic will be used (as cache.size()>0), 
> which cause a problem as the cache is NOT updated from the old code path.
> An additional problem is that the cache is turned on for all the metadata 
> table even if the cache is not required... 
> ## Proposed solution
> As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
> It's not updated during the typedTable.put() call but updated by a separated 
> component during double-buffer flash.
> I would suggest to remove the cache related methods from TypedTable (move to 
> a separated implementation). I think this kind of caching can be independent 
> from the TypedTable implementation. We can continue to use the simple 
> TypedTable everywhere where we don't need to use any kind of caching.
> For caching we can use a separated object. It would make it more visible that 
> the cache should always be updated manually all the time. This separated 
> caching ut

[jira] [Updated] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-1926:
-
Labels: pull-request-available  (was: )

> The new caching layer is used for old OM requests but not updated
> -
>
> Key: HDDS-1926
> URL: https://issues.apache.org/jira/browse/HDDS-1926
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Elek, Marton
>Assignee: Bharat Viswanadham
>Priority: Blocker
>  Labels: pull-request-available
>
> HDDS-1499 introduced a new caching layer together with a double-buffer based 
> db writer to support OM HA.
> TLDR: I think the caching layer is not updated for new volume creation. And 
> (slightly related to this problem) I suggest to separated the TypedTable and 
> the caching layer.
> ## How to reproduce the problem?
> 1. Start a docker compose cluster
> 2. Create one volume (let's say `/vol1`)
> 3. Restart the om (!)
> 4. Try to create an _other_ volume twice!
> ```
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> ```
> Expected behavior is an error:
> {code}
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> VOLUME_ALREADY_EXISTS 
> {code}
> The problem is that the new cache is used even for the old code path 
> (TypedTable):
> {code}
>  @Override
>   public VALUE get(KEY key) throws IOException {
> // Here the metadata lock will guarantee that cache is not updated for 
> same
> // key during get key.
> CacheResult> cacheResult =
> cache.lookup(new CacheKey<>(key));
> if (cacheResult.getCacheStatus() == EXISTS) {
>   return cacheResult.getValue().getCacheValue();
> } else if (cacheResult.getCacheStatus() == NOT_EXIST) {
>   return null;
> } else {
>   return getFromTable(key);
> }
>   }
> {code}
> For volume table after the FIRST start it always returns with 
> `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:
> {code}
>   public CacheResult lookup(CACHEKEY cachekey) {
> if (cache.size() == 0) {
>   return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
>   null);
> }
> {code}
> But after a restart the cache is pre-loaded by the TypedTable.constructor. 
> After the restart, the real caching logic will be used (as cache.size()>0), 
> which cause a problem as the cache is NOT updated from the old code path.
> An additional problem is that the cache is turned on for all the metadata 
> table even if the cache is not required... 
> ## Proposed solution
> As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
> It's not updated during the typedTable.put() call but updated by a separated 
> component during double-buffer flash.
> I would suggest to remove the cache related methods from TypedTable (move to 
> a separated implementation). I think this kind of caching can be independent 
> from the TypedTable implementation. We can continue to use the simple 
> TypedTable everywhere where we don't need to use any kind of caching.
> For caching we can use a separated object. It would make it more visible that 
> the cache should always be updated manually all the time. This separated 
> caching utility may include a reference to the original TypedTable/Table. 
> With this approach we can separate the different responsibilities but provide 
> the same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1926?focusedWorklogId=290627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290627
 ]

ASF GitHub Bot logged work on HDDS-1926:


Author: ASF GitHub Bot
Created on: 07/Aug/19 17:54
Start Date: 07/Aug/19 17:54
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1247: HDDS-1926. The 
new caching layer is used for old OM requests but not updated.
URL: https://github.com/apache/hadoop/pull/1247#issuecomment-519203842
 
 
   Added tests with OM restart.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290627)
Time Spent: 20m  (was: 10m)

> The new caching layer is used for old OM requests but not updated
> -
>
> Key: HDDS-1926
> URL: https://issues.apache.org/jira/browse/HDDS-1926
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Elek, Marton
>Assignee: Bharat Viswanadham
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HDDS-1499 introduced a new caching layer together with a double-buffer based 
> db writer to support OM HA.
> TLDR: I think the caching layer is not updated for new volume creation. And 
> (slightly related to this problem) I suggest to separated the TypedTable and 
> the caching layer.
> ## How to reproduce the problem?
> 1. Start a docker compose cluster
> 2. Create one volume (let's say `/vol1`)
> 3. Restart the om (!)
> 4. Try to create an _other_ volume twice!
> ```
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> ```
> Expected behavior is an error:
> {code}
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> VOLUME_ALREADY_EXISTS 
> {code}
> The problem is that the new cache is used even for the old code path 
> (TypedTable):
> {code}
>  @Override
>   public VALUE get(KEY key) throws IOException {
> // Here the metadata lock will guarantee that cache is not updated for 
> same
> // key during get key.
> CacheResult> cacheResult =
> cache.lookup(new CacheKey<>(key));
> if (cacheResult.getCacheStatus() == EXISTS) {
>   return cacheResult.getValue().getCacheValue();
> } else if (cacheResult.getCacheStatus() == NOT_EXIST) {
>   return null;
> } else {
>   return getFromTable(key);
> }
>   }
> {code}
> For volume table after the FIRST start it always returns with 
> `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:
> {code}
>   public CacheResult lookup(CACHEKEY cachekey) {
> if (cache.size() == 0) {
>   return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
>   null);
> }
> {code}
> But after a restart the cache is pre-loaded by the TypedTable.constructor. 
> After the restart, the real caching logic will be used (as cache.size()>0), 
> which cause a problem as the cache is NOT updated from the old code path.
> An additional problem is that the cache is turned on for all the metadata 
> table even if the cache is not required... 
> ## Proposed solution
> As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
> It's not updated during the typedTable.put() call but updated by a separated 
> component during double-buffer flash.
> I would suggest to remove the cache related methods from TypedTable (move to 
> a separated implementation). I think this kind of caching can be independent 
> from the TypedTable implementation. We can continue to use the simple 
> TypedTable everywhere where we don't need to use any kind of caching.
> For caching we can use a separated object. It would make it more visible that 
> the cache should always be updated manually all the time. This separated 
> caching utility may include a reference to the original TypedTable/Table. 
> With this approach we can separate the different responsibilities but provide 
> the same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-1926:
-
Status: Patch Available  (was: Open)

> The new caching layer is used for old OM requests but not updated
> -
>
> Key: HDDS-1926
> URL: https://issues.apache.org/jira/browse/HDDS-1926
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Elek, Marton
>Assignee: Bharat Viswanadham
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HDDS-1499 introduced a new caching layer together with a double-buffer based 
> db writer to support OM HA.
> TLDR: I think the caching layer is not updated for new volume creation. And 
> (slightly related to this problem) I suggest to separated the TypedTable and 
> the caching layer.
> ## How to reproduce the problem?
> 1. Start a docker compose cluster
> 2. Create one volume (let's say `/vol1`)
> 3. Restart the om (!)
> 4. Try to create an _other_ volume twice!
> ```
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> ```
> Expected behavior is an error:
> {code}
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> VOLUME_ALREADY_EXISTS 
> {code}
> The problem is that the new cache is used even for the old code path 
> (TypedTable):
> {code}
>  @Override
>   public VALUE get(KEY key) throws IOException {
> // Here the metadata lock will guarantee that cache is not updated for 
> same
> // key during get key.
> CacheResult> cacheResult =
> cache.lookup(new CacheKey<>(key));
> if (cacheResult.getCacheStatus() == EXISTS) {
>   return cacheResult.getValue().getCacheValue();
> } else if (cacheResult.getCacheStatus() == NOT_EXIST) {
>   return null;
> } else {
>   return getFromTable(key);
> }
>   }
> {code}
> For volume table after the FIRST start it always returns with 
> `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:
> {code}
>   public CacheResult lookup(CACHEKEY cachekey) {
> if (cache.size() == 0) {
>   return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
>   null);
> }
> {code}
> But after a restart the cache is pre-loaded by the TypedTable.constructor. 
> After the restart, the real caching logic will be used (as cache.size()>0), 
> which cause a problem as the cache is NOT updated from the old code path.
> An additional problem is that the cache is turned on for all the metadata 
> table even if the cache is not required... 
> ## Proposed solution
> As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
> It's not updated during the typedTable.put() call but updated by a separated 
> component during double-buffer flash.
> I would suggest to remove the cache related methods from TypedTable (move to 
> a separated implementation). I think this kind of caching can be independent 
> from the TypedTable implementation. We can continue to use the simple 
> TypedTable everywhere where we don't need to use any kind of caching.
> For caching we can use a separated object. It would make it more visible that 
> the cache should always be updated manually all the time. This separated 
> caching utility may include a reference to the original TypedTable/Table. 
> With this approach we can separate the different responsibilities but provide 
> the same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902251#comment-16902251
 ] 

Bharat Viswanadham edited comment on HDDS-1926 at 8/7/19 5:56 PM:
--

As discussed offline with [~arp] and [~elek]

We shall use ratisEnabled and define cache policy to bucket and volume table. 
As a quick solution, we are going with this approach.

 

And another discussion we had is We can have CachedTypedTable which extends 
Table and overload put to take transactionIndex.


was (Author: bharatviswa):
As discussed offline with [~arp] and [~elek]

We shall use ratisEnabled and define cache policy to bucket and volume table.

 

And another discussion we had is We can have CachedTypedTable which extends 
Table and overload put to take transactionIndex.

> The new caching layer is used for old OM requests but not updated
> -
>
> Key: HDDS-1926
> URL: https://issues.apache.org/jira/browse/HDDS-1926
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Elek, Marton
>Assignee: Bharat Viswanadham
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HDDS-1499 introduced a new caching layer together with a double-buffer based 
> db writer to support OM HA.
> TLDR: I think the caching layer is not updated for new volume creation. And 
> (slightly related to this problem) I suggest to separated the TypedTable and 
> the caching layer.
> ## How to reproduce the problem?
> 1. Start a docker compose cluster
> 2. Create one volume (let's say `/vol1`)
> 3. Restart the om (!)
> 4. Try to create an _other_ volume twice!
> ```
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol2
> 2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop 
> as owner.
> ```
> Expected behavior is an error:
> {code}
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> bash-4.2$ ozone sh volume create /vol1
> 2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop 
> as owner.
> VOLUME_ALREADY_EXISTS 
> {code}
> The problem is that the new cache is used even for the old code path 
> (TypedTable):
> {code}
>  @Override
>   public VALUE get(KEY key) throws IOException {
> // Here the metadata lock will guarantee that cache is not updated for 
> same
> // key during get key.
> CacheResult> cacheResult =
> cache.lookup(new CacheKey<>(key));
> if (cacheResult.getCacheStatus() == EXISTS) {
>   return cacheResult.getValue().getCacheValue();
> } else if (cacheResult.getCacheStatus() == NOT_EXIST) {
>   return null;
> } else {
>   return getFromTable(key);
> }
>   }
> {code}
> For volume table after the FIRST start it always returns with 
> `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:
> {code}
>   public CacheResult lookup(CACHEKEY cachekey) {
> if (cache.size() == 0) {
>   return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
>   null);
> }
> {code}
> But after a restart the cache is pre-loaded by the TypedTable.constructor. 
> After the restart, the real caching logic will be used (as cache.size()>0), 
> which cause a problem as the cache is NOT updated from the old code path.
> An additional problem is that the cache is turned on for all the metadata 
> table even if the cache is not required... 
> ## Proposed solution
> As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
> It's not updated during the typedTable.put() call but updated by a separated 
> component during double-buffer flash.
> I would suggest to remove the cache related methods from TypedTable (move to 
> a separated implementation). I think this kind of caching can be independent 
> from the TypedTable implementation. We can continue to use the simple 
> TypedTable everywhere where we don't need to use any kind of caching.
> For caching we can use a separated object. It would make it more visible that 
> the cache should always be updated manually all the time. This separated 
> caching utility may include a reference to the original TypedTable/Table. 
> With this approach we can separate the different responsibilities but provide 
> the same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   >