[jira] [Comment Edited] (HDFS-14928) UI: unifying the WebUI across different components.

2019-11-05 Thread Xieming Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968124#comment-16968124
 ] 

Xieming Li edited comment on HDFS-14928 at 11/6/19 6:52 AM:


I have tried implementing #1 on web UI of NameNode, Router, and DataNode.

After applying `HDFS-14928.001.patch`, the WebUI will look like the followings:
 !HDFS-14928.jpg|width=600!

During the implementation, I came across two issues that I want to discuss: 
 # I haven't modify the Web UI of DataNode because JMX of DN does not contain 
any information about its own running status.
 We have to either A) expose the DN's running status in JMX metrics or B) use 
ajax to query the JMX of NN. We can also C) skip the changes for now. 
 Which do you think is better amoug A,B or C? 
 # NNs can be in Safemode and Standby at the same time. In current 
implementation, safemode will never be shown on Overview page.
 Should we change it so that when a Standby NN is in Safemode, we show Safemode 
icon.


was (Author: risyomei):
I have tried implementing #1 on web UI of NameNode, Router, and DataNode.

After applying HDFS-14928.001.patch, the WebUI will look like the followings:
 !HDFS-14928.jpg|width=600!

During the implementation, I came across two issues that I want to discuss: 
 # I haven't modify the Web UI of DataNode because JMX of DN does not contain 
any information about its own running status. 
We have to either A) expose the DN's running status in JMX metrics or B) use 
ajax to query the JMX of NN. We can also C) skip the changes for now. 
Which do you think is better amoug A,B or C? 
 # NNs can be in Safemode and Standby at the same time. In current 
implementation, safemode will never be shown on Overview page.
Should we change it so that when a Standby NN is in Safemode, we show Safemode 
icon.

> UI: unifying the WebUI across different components.
> ---
>
> Key: HDFS-14928
> URL: https://issues.apache.org/jira/browse/HDFS-14928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, 
> HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, 
> NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png
>
>
> The WebUI of different components could be unified.
> *Router:*
> |Current|  !RBF_orig.png|width=500! | 
> |Proposed 1 (With Icon) |  !RBF_wo_legend.png|width=500! | 
> |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500!  | 
> *NameNode:*
> |Current| !NN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! |
> *DataNode:*
> |Current| !DN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.

2019-11-05 Thread Xieming Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968124#comment-16968124
 ] 

Xieming Li commented on HDFS-14928:
---

I have tried implementing #1 on web UI of NameNode, Router, and DataNode.

After applying HDFS-14928.001.patch, the WebUI will look like the followings:
 !HDFS-14928.jpg|width=600!

During the implementation, I came across two issues that I want to discuss: 
 # I haven't modify the Web UI of DataNode because JMX of DN does not contain 
any information about its own running status. 
We have to either A) expose the DN's running status in JMX metrics or B) use 
ajax to query the JMX of NN. We can also C) skip the changes for now. 
Which do you think is better amoug A,B or C? 
 # NNs can be in Safemode and Standby at the same time. In current 
implementation, safemode will never be shown on Overview page.
Should we change it so that when a Standby NN is in Safemode, we show Safemode 
icon.

> UI: unifying the WebUI across different components.
> ---
>
> Key: HDFS-14928
> URL: https://issues.apache.org/jira/browse/HDFS-14928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, 
> HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, 
> NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png
>
>
> The WebUI of different components could be unified.
> *Router:*
> |Current|  !RBF_orig.png|width=500! | 
> |Proposed 1 (With Icon) |  !RBF_wo_legend.png|width=500! | 
> |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500!  | 
> *NameNode:*
> |Current| !NN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! |
> *DataNode:*
> |Current| !DN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.

2019-11-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968122#comment-16968122
 ] 

Íñigo Goiri commented on HDFS-14928:


dfshealth.html shouldn't have dependencies on RBF but the other way around.
The common stuff should go to HDFS.

> UI: unifying the WebUI across different components.
> ---
>
> Key: HDFS-14928
> URL: https://issues.apache.org/jira/browse/HDFS-14928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, 
> HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, 
> NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png
>
>
> The WebUI of different components could be unified.
> *Router:*
> |Current|  !RBF_orig.png|width=500! | 
> |Proposed 1 (With Icon) |  !RBF_wo_legend.png|width=500! | 
> |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500!  | 
> *NameNode:*
> |Current| !NN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! |
> *DataNode:*
> |Current| !DN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.

2019-11-05 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968121#comment-16968121
 ] 

Hadoop QA commented on HDFS-14928:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HDFS-14928 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14928 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28260/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> UI: unifying the WebUI across different components.
> ---
>
> Key: HDFS-14928
> URL: https://issues.apache.org/jira/browse/HDFS-14928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, 
> HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, 
> NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png
>
>
> The WebUI of different components could be unified.
> *Router:*
> |Current|  !RBF_orig.png|width=500! | 
> |Proposed 1 (With Icon) |  !RBF_wo_legend.png|width=500! | 
> |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500!  | 
> *NameNode:*
> |Current| !NN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! |
> *DataNode:*
> |Current| !DN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14928) UI: unifying the WebUI across different components.

2019-11-05 Thread Xieming Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li updated HDFS-14928:
--
Attachment: HDFS-14928.001.patch
Status: Patch Available  (was: Open)

> UI: unifying the WebUI across different components.
> ---
>
> Key: HDFS-14928
> URL: https://issues.apache.org/jira/browse/HDFS-14928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, 
> HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, 
> NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png
>
>
> The WebUI of different components could be unified.
> *Router:*
> |Current|  !RBF_orig.png|width=500! | 
> |Proposed 1 (With Icon) |  !RBF_wo_legend.png|width=500! | 
> |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500!  | 
> *NameNode:*
> |Current| !NN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! |
> *DataNode:*
> |Current| !DN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14928) UI: unifying the WebUI across different components.

2019-11-05 Thread Xieming Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li updated HDFS-14928:
--
Attachment: HDFS-14928.jpg

> UI: unifying the WebUI across different components.
> ---
>
> Key: HDFS-14928
> URL: https://issues.apache.org/jira/browse/HDFS-14928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, 
> HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, 
> NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png
>
>
> The WebUI of different components could be unified.
> *Router:*
> |Current|  !RBF_orig.png|width=500! | 
> |Proposed 1 (With Icon) |  !RBF_wo_legend.png|width=500! | 
> |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500!  | 
> *NameNode:*
> |Current| !NN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! |
> *DataNode:*
> |Current| !DN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc

2019-11-05 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2407:
---
Status: Patch Available  (was: Open)

> Reduce log level of per-node failure in XceiverClientGrpc
> -
>
> Key: HDDS-2407
> URL: https://issues.apache.org/jira/browse/HDDS-2407
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Client
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When reading from a pipeline, client should not care if some datanode could 
> not service the request, as long as the pipeline as a whole is OK.  The [log 
> message|https://github.com/apache/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304]
>  indicating node failure was [increased to error 
> level|https://github.com/apache/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288]
>  in HDDS-1780.  This task proposes to change it back to debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14953) [Dynamometer] Missing blocks gradually increase after NN starts

2019-11-05 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968113#comment-16968113
 ] 

Takanobu Asanuma commented on HDFS-14953:
-

The case is very similar to [this 
issue|https://github.com/linkedin/dynamometer/issues/64]. (Thanks for reporting 
it, [~weichiu].)

{quote}
After HDFS-9260, NN expects block replicas to be reported in ascending order of 
block id. If a block id is not in order, NN discards it silently. Because 
simulated DataNode in Dynamometer uses hash map to store block replicas, the 
replicas are not reported in order. The Dynamometer cluster would then see 
missing blocks gradually increase several minutes after NN starts.
{quote}

> [Dynamometer] Missing blocks gradually increase after NN starts
> ---
>
> Key: HDFS-14953
> URL: https://issues.apache.org/jira/browse/HDFS-14953
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: tools
>Reporter: Takanobu Asanuma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14953) [Dynamometer] Missing blocks gradually increase after NN starts

2019-11-05 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-14953:
---

 Summary: [Dynamometer] Missing blocks gradually increase after NN 
starts
 Key: HDFS-14953
 URL: https://issues.apache.org/jira/browse/HDFS-14953
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: tools
Reporter: Takanobu Asanuma






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14384) When lastLocatedBlock token expire, it will take 1~3s second to refetch it.

2019-11-05 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968105#comment-16968105
 ] 

Surendra Singh Lilhore commented on HDFS-14384:
---

Fixed check-style warnings.

> When lastLocatedBlock token expire, it will take 1~3s second to refetch it.
> ---
>
> Key: HDFS-14384
> URL: https://issues.apache.org/jira/browse/HDFS-14384
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-14384.001.patch, HDFS-14384.002.patch, 
> HDFS-14384.003.patch
>
>
> Scenario :
>  1. Write file with one block which is in-progress.
>   2. Open input stream and close the output stream.
>   3. Wait for block token expiration and read the data.
>   4. Last block read take 1~3 sec to read it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14384) When lastLocatedBlock token expire, it will take 1~3s second to refetch it.

2019-11-05 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-14384:
--
Attachment: HDFS-14384.003.patch

> When lastLocatedBlock token expire, it will take 1~3s second to refetch it.
> ---
>
> Key: HDFS-14384
> URL: https://issues.apache.org/jira/browse/HDFS-14384
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-14384.001.patch, HDFS-14384.002.patch, 
> HDFS-14384.003.patch
>
>
> Scenario :
>  1. Write file with one block which is in-progress.
>   2. Open input stream and close the output stream.
>   3. Wait for block token expiration and read the data.
>   4. Last block read take 1~3 sec to read it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2393) HDDS-1847 broke some unit tests

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2393:
-
Labels: pull-request-available  (was: )

> HDDS-1847 broke some unit tests
> ---
>
> Key: HDDS-2393
> URL: https://issues.apache.org/jira/browse/HDDS-2393
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Chris Teoh
>Assignee: Chris Teoh
>Priority: Major
>  Labels: pull-request-available
>
> Siyao Meng commented on HDDS-1847:
> --
> Looks like this commit breaks {{TestKeyManagerImpl}} in {{setUp()}} and 
> {{cleanup()}}. Run {{TestKeyManagerImpl#testListStatus()}} to steadily repro. 
> I believe there could be other tests that are broken by this.
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.getSpnegoPrincipal(StorageContainerManagerHttpServer.java:74)
> at 
> org.apache.hadoop.hdds.server.BaseHttpServer.(BaseHttpServer.java:81)
> at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.(StorageContainerManagerHttpServer.java:36)
> at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:330)
> at org.apache.hadoop.hdds.scm.TestUtils.getScm(TestUtils.java:544)
> at 
> org.apache.hadoop.ozone.om.TestKeyManagerImpl.setUp(TestKeyManagerImpl.java:150)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.om.TestKeyManagerImpl.cleanup(TestKeyManagerImpl.java:176)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2393) HDDS-1847 broke some unit tests

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2393?focusedWorklogId=339169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339169
 ]

ASF GitHub Bot logged work on HDDS-2393:


Author: ASF GitHub Bot
Created on: 06/Nov/19 05:55
Start Date: 06/Nov/19 05:55
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #111: 
HDDS-2393: Fixing NPE in unit test from HDDS-1847
URL: https://github.com/apache/hadoop-ozone/pull/111
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339169)
Remaining Estimate: 0h
Time Spent: 10m

> HDDS-1847 broke some unit tests
> ---
>
> Key: HDDS-2393
> URL: https://issues.apache.org/jira/browse/HDDS-2393
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Chris Teoh
>Assignee: Chris Teoh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Siyao Meng commented on HDDS-1847:
> --
> Looks like this commit breaks {{TestKeyManagerImpl}} in {{setUp()}} and 
> {{cleanup()}}. Run {{TestKeyManagerImpl#testListStatus()}} to steadily repro. 
> I believe there could be other tests that are broken by this.
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.getSpnegoPrincipal(StorageContainerManagerHttpServer.java:74)
> at 
> org.apache.hadoop.hdds.server.BaseHttpServer.(BaseHttpServer.java:81)
> at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.(StorageContainerManagerHttpServer.java:36)
> at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:330)
> at org.apache.hadoop.hdds.scm.TestUtils.getScm(TestUtils.java:544)
> at 
> org.apache.hadoop.ozone.om.TestKeyManagerImpl.setUp(TestKeyManagerImpl.java:150)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.om.TestKeyManagerImpl.cleanup(TestKeyManagerImpl.java:176)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> 

[jira] [Updated] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc

2019-11-05 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2407:
---
Description: When reading from a pipeline, client should not care if some 
datanode could not service the request, as long as the pipeline as a whole is 
OK.  The [log 
message|https://github.com/apache/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304]
 indicating node failure was [increased to error 
level|https://github.com/apache/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288]
 in HDDS-1780.  This task proposes to change it back to debug.  (was: When 
reading from a pipeline, client should not care if some datanode could not 
service the request, as long as the pipeline as a whole is OK.  The [log 
message|https://github.com/bshashikant/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304]
 indicating node failure was [increased to error 
level|https://github.com/bshashikant/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288]
 in HDDS-1780.  This task proposes to change it back to debug.)

> Reduce log level of per-node failure in XceiverClientGrpc
> -
>
> Key: HDDS-2407
> URL: https://issues.apache.org/jira/browse/HDDS-2407
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Client
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When reading from a pipeline, client should not care if some datanode could 
> not service the request, as long as the pipeline as a whole is OK.  The [log 
> message|https://github.com/apache/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304]
>  indicating node failure was [increased to error 
> level|https://github.com/apache/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288]
>  in HDDS-1780.  This task proposes to change it back to debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2407:
-
Labels: pull-request-available  (was: )

> Reduce log level of per-node failure in XceiverClientGrpc
> -
>
> Key: HDDS-2407
> URL: https://issues.apache.org/jira/browse/HDDS-2407
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Client
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>
> When reading from a pipeline, client should not care if some datanode could 
> not service the request, as long as the pipeline as a whole is OK.  The [log 
> message|https://github.com/bshashikant/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304]
>  indicating node failure was [increased to error 
> level|https://github.com/bshashikant/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288]
>  in HDDS-1780.  This task proposes to change it back to debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2407?focusedWorklogId=339167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339167
 ]

ASF GitHub Bot logged work on HDDS-2407:


Author: ASF GitHub Bot
Created on: 06/Nov/19 05:54
Start Date: 06/Nov/19 05:54
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #120: HDDS-2407. 
Reduce log level of per-node failure in XceiverClientGrpc
URL: https://github.com/apache/hadoop-ozone/pull/120
 
 
   ## What changes were proposed in this pull request?
   
   When reading from a pipeline, client should not care if some datanode could 
not service the request, as long as the pipeline as a whole is OK.  The [log 
message](https://github.com/apache/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304)
 indicating node failure was [increased to error 
level](https://github.com/apache/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288)
 in [HDDS-1780](https://issues.apache.org/jira/browse/HDDS-1780).  This PR 
proposes to change it back to debug.  Pipeline-level failure is still logged as 
error.
   
   https://issues.apache.org/jira/browse/HDDS-2407
   
   ## How was this patch tested?
   
   Tested locally on docker-compose cluster with 0/1/2/3 datanodes down.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339167)
Remaining Estimate: 0h
Time Spent: 10m

> Reduce log level of per-node failure in XceiverClientGrpc
> -
>
> Key: HDDS-2407
> URL: https://issues.apache.org/jira/browse/HDDS-2407
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Client
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When reading from a pipeline, client should not care if some datanode could 
> not service the request, as long as the pipeline as a whole is OK.  The [log 
> message|https://github.com/bshashikant/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304]
>  indicating node failure was [increased to error 
> level|https://github.com/bshashikant/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288]
>  in HDDS-1780.  This task proposes to change it back to debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2404) Add support for Registered id as service identifier for CSR.

2019-11-05 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968096#comment-16968096
 ] 

Bharat Viswanadham commented on HDDS-2404:
--

Can we move this task under HDDS-505, as it is related to OM HA work.?

> Add support for Registered id as service identifier for CSR.
> 
>
> Key: HDDS-2404
> URL: https://issues.apache.org/jira/browse/HDDS-2404
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Anu Engineer
>Assignee: Abhishek Purohit
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The SCM HA needs the ability to represent a group as a single entity. So that 
> Tokens for each of the OM which is part of an HA group can be honored by the 
> data nodes. 
> This patch adds the notion of a service group ID to the Certificate 
> Infrastructure. In the next JIRAs, we will use this capability when issuing 
> certificates to OM -- especially when they are in HA mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1643) Send hostName also part of OMRequest

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1643?focusedWorklogId=339164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339164
 ]

ASF GitHub Bot logged work on HDDS-1643:


Author: ASF GitHub Bot
Created on: 06/Nov/19 05:49
Start Date: 06/Nov/19 05:49
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #70: 
HDDS-1643. Send hostName also part of OMRequest.
URL: https://github.com/apache/hadoop-ozone/pull/70
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339164)
Time Spent: 20m  (was: 10m)

> Send hostName also part of OMRequest
> 
>
> Key: HDDS-1643
> URL: https://issues.apache.org/jira/browse/HDDS-1643
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This Jira is created based on the comment from [~eyang] on HDDS-1600 jira.
> [~bharatviswa] can hostname be used as part of OM request? For running in 
> docker container, virtual private network address may not be routable or 
> exposed to outside world. Using IP to identify the source client location may 
> not be enough. It would be nice to have ability support hostname based 
> request too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1643) Send hostName also part of OMRequest

2019-11-05 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-1643.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> Send hostName also part of OMRequest
> 
>
> Key: HDDS-1643
> URL: https://issues.apache.org/jira/browse/HDDS-1643
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This Jira is created based on the comment from [~eyang] on HDDS-1600 jira.
> [~bharatviswa] can hostname be used as part of OM request? For running in 
> docker container, virtual private network address may not be routable or 
> exposed to outside world. Using IP to identify the source client location may 
> not be enough. It would be nice to have ability support hostname based 
> request too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc

2019-11-05 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2407:
--

 Summary: Reduce log level of per-node failure in XceiverClientGrpc
 Key: HDDS-2407
 URL: https://issues.apache.org/jira/browse/HDDS-2407
 Project: Hadoop Distributed Data Store
  Issue Type: Task
  Components: Ozone Client
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


When reading from a pipeline, client should not care if some datanode could not 
service the request, as long as the pipeline as a whole is OK.  The [log 
message|https://github.com/bshashikant/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304]
 indicating node failure was [increased to error 
level|https://github.com/bshashikant/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288]
 in HDDS-1780.  This task proposes to change it back to debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured

2019-11-05 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-2064.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> Add tests for incorrect OM HA config when node ID or RPC address is not 
> configured
> --
>
> Key: HDDS-2064
> URL: https://issues.apache.org/jira/browse/HDDS-2064
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
> `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't 
> exist.-
> -Root cause:-
> -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. 
> This happens when local OM doesn't match any `ozone.om.address.idX.omX` in 
> the config.-
> Due to the refactoring done in HDDS-2162. This fix has been included in that 
> commit. I will repurpose the jira to add some tests for the HA config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2064?focusedWorklogId=339157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339157
 ]

ASF GitHub Bot logged work on HDDS-2064:


Author: ASF GitHub Bot
Created on: 06/Nov/19 05:29
Start Date: 06/Nov/19 05:29
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #119: 
HDDS-2064. Add tests for incorrect OM HA config when node ID or RPC address is 
not configured
URL: https://github.com/apache/hadoop-ozone/pull/119
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339157)
Time Spent: 2h 10m  (was: 2h)

> Add tests for incorrect OM HA config when node ID or RPC address is not 
> configured
> --
>
> Key: HDDS-2064
> URL: https://issues.apache.org/jira/browse/HDDS-2064
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
> `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't 
> exist.-
> -Root cause:-
> -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. 
> This happens when local OM doesn't match any `ozone.om.address.idX.omX` in 
> the config.-
> Due to the refactoring done in HDDS-2162. This fix has been included in that 
> commit. I will repurpose the jira to add some tests for the HA config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2359?focusedWorklogId=339153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339153
 ]

ASF GitHub Bot logged work on HDDS-2359:


Author: ASF GitHub Bot
Created on: 06/Nov/19 05:26
Start Date: 06/Nov/19 05:26
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #82: 
HDDS-2359. Seeking randomly in a key with more than 2 blocks of data leads to 
inconsistent reads
URL: https://github.com/apache/hadoop-ozone/pull/82
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339153)
Time Spent: 20m  (was: 10m)

> Seeking randomly in a key with more than 2 blocks of data leads to 
> inconsistent reads
> -
>
> Key: HDDS-2359
> URL: https://issues.apache.org/jira/browse/HDDS-2359
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During Hive testing we found the following exception:
> {code}
> TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1569246922012_0214_1_03_00_3:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 16 more
> Caused by: java.io.IOException: java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:366)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
> at 
> 

[jira] [Resolved] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads

2019-11-05 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-2359.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> Seeking randomly in a key with more than 2 blocks of data leads to 
> inconsistent reads
> -
>
> Key: HDDS-2359
> URL: https://issues.apache.org/jira/browse/HDDS-2359
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During Hive testing we found the following exception:
> {code}
> TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1569246922012_0214_1_03_00_3:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
> ... 16 more
> Caused by: java.io.IOException: java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:366)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
> ... 18 more
> Caused by: java.io.IOException: error iterating
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:835)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:74)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361)
> ... 24 more
> Caused by: java.io.IOException: Error reading file: 
> o3fs://hive.warehouse.vc0136.halxg.cloudera.com:9862/data/inventory/delta_001_001_/bucket_0
> at 
> 

[jira] [Resolved] (HDDS-2380) Use the Table.isExist API instead of get() call while checking for presence of key.

2019-11-05 Thread Jitendra Nath Pandey (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey resolved HDDS-2380.

Resolution: Fixed

> Use the Table.isExist API instead of get() call while checking for presence 
> of key.
> ---
>
> Key: HDDS-2380
> URL: https://issues.apache.org/jira/browse/HDDS-2380
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, when OM creates a file/directory, it checks the absence of all 
> prefix paths of the key in its RocksDB. Since we don't care about the 
> deserialization of the actual value, we should use the isExist API added in 
> org.apache.hadoop.hdds.utils.db.Table which internally uses the more 
> performant keyMayExist API of RocksDB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968063#comment-16968063
 ] 

Hadoop QA commented on HDFS-14941:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
58s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
36s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
25s{color} | {color:green} root: The patch generated 0 new + 705 unchanged - 1 
fixed = 705 total (was 706) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 34s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}238m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestFixKerberosTicketOrder |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14941 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985005/HDFS-14941.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7cdf1ae334d4 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968013#comment-16968013
 ] 

Hadoop QA commented on HDFS-14941:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
32s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
45s{color} | {color:green} root: The patch generated 0 new + 705 unchanged - 1 
fixed = 705 total (was 706) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 31s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 41s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}230m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestFixKerberosTicketOrder |
|   | hadoop.conf.TestCommonConfigurationFields |
|   | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.TestMultipleNNPortQOP |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14941 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985000/HDFS-14941.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7a96a3dee053 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | 

[jira] [Commented] (HDFS-14922) On StartUp , Snapshot modification time got changed

2019-11-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968005#comment-16968005
 ] 

Íñigo Goiri commented on HDFS-14922:


I meant the full javadoc:
{code}
/**
 * Log that a snapshot is created.
 * @param snapRoot Root of the snapshot.
 * @param snapName Name of the snapshot.
 * @param toLogRpcIds If it is logging RPC ids.
 * @param mtime The snapshot creation time set by Time.now().
 */
void logCreateSnapshot(String snapRoot, String snapName, boolean toLogRpcIds,
long mtime) {
{code}

It doesn't hurt to improve the readability of the existing code.
BTW, even though {{setSnapshotMTime()}} is package protected, we should also 
add the javadoc there.

> On StartUp , Snapshot modification time got changed
> ---
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14949) HttpFS does not support getServerDefaults()

2019-11-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968000#comment-16968000
 ] 

Íñigo Goiri commented on HDFS-14949:


Let's do the same then, implement both and refer one to the other.
The deprecation warning would be fine in this case.

> HttpFS does not support getServerDefaults()
> ---
>
> Key: HDFS-14949
> URL: https://issues.apache.org/jira/browse/HDFS-14949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14949.001.patch, HDFS-14949.002.patch, 
> HDFS-14949.003.patch
>
>
> For HttpFS server to function as a fully webhdfs-compatible service, 
> getServerDefaults() support is needed.  It is increasingly used in new 
> features and improvements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14928) UI: unifying the WebUI across different components.

2019-11-05 Thread Xieming Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li reassigned HDFS-14928:
-

Assignee: Xieming Li

> UI: unifying the WebUI across different components.
> ---
>
> Key: HDFS-14928
> URL: https://issues.apache.org/jira/browse/HDFS-14928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Trivial
> Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, 
> NN_orig.png, NN_with_legend.png, NN_wo_legend.png, RBF_orig.png, 
> RBF_with_legend.png, RBF_wo_legend.png
>
>
> The WebUI of different components could be unified.
> *Router:*
> |Current|  !RBF_orig.png|width=500! | 
> |Proposed 1 (With Icon) |  !RBF_wo_legend.png|width=500! | 
> |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500!  | 
> *NameNode:*
> |Current| !NN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! |
> *DataNode:*
> |Current| !DN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14806) Bootstrap standby may fail if used in-progress tailing

2019-11-05 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967978#comment-16967978
 ] 

Konstantin Shvachko commented on HDFS-14806:


+1 on v4 from me as well.

> Bootstrap standby may fail if used in-progress tailing
> --
>
> Key: HDFS-14806
> URL: https://issues.apache.org/jira/browse/HDFS-14806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14806.001.patch, HDFS-14806.002.patch, 
> HDFS-14806.003.patch, HDFS-14806.004.patch
>
>
> One issue we went across was that if in-progress tailing is enabled, 
> bootstrap standby could fail.
> When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get 
> edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an 
> upper bound on how many txnid can be included in one RPC call. The default is 
> 5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from 
> JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's 
> current transactionID, NN2 may return a state that is > 5000 txnid from NN1's 
> current image. But NN1 can only see 5000 more txnid from JNs. At this point 
> NN1 goes panic, because txnid retuned by JNs is behind NN2's returned state, 
> bootstrap then fail.
> Essentially, bootstrap standby can fail if both of two following conditions 
> are met:
>  # in-progress tailing is enabled AND
>  # the boostraping NN is too far (>5000 txid)  behind 
> Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super 
> large value allowed bootstrap to continue. But this is hardly the ideal 
> solution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967969#comment-16967969
 ] 

Konstantin Shvachko commented on HDFS-14941:


+1 for v6 patch. If anybody wants to review please do.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch, 
> HDFS-14941.006.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967962#comment-16967962
 ] 

Chen Liang commented on HDFS-14941:
---

Post v006 patch after offline discussion with [~shv]. The diff is changing unit 
test to check for correctness after failover.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch, 
> HDFS-14941.006.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14941:
--
Attachment: HDFS-14941.006.patch

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch, 
> HDFS-14941.006.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2384) Large chunks during write can have memory pressure on DN with multiple clients

2019-11-05 Thread Anu Engineer (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967931#comment-16967931
 ] 

Anu Engineer commented on HDDS-2384:


Nope, we can encode the data in a set of small packets or read them as a 
sequence of small packets. Let us say, 8KB/64KB buffer and we read and write 
the data continually to the underlying disk.



> Large chunks during write can have memory pressure on DN with multiple clients
> --
>
> Key: HDDS-2384
> URL: https://issues.apache.org/jira/browse/HDDS-2384
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Anu Engineer
>Priority: Major
>  Labels: performance
>
> During large file writes, it ends up writing {{16 MB}} chunks.  
> https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L691
> In large clusters, 100s of clients may connect to DN. In such cases, 
> depending on the incoming write workload mem load on DN can increase 
> significantly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2384) Large chunks during write can have memory pressure on DN with multiple clients

2019-11-05 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967923#comment-16967923
 ] 

Rajesh Balamohan commented on HDDS-2384:


Thanks [~aengineer] for sharing the details. Wouldn't this still need 16 MB mem 
when constructing ChunkInfo from protobuf? 

> Large chunks during write can have memory pressure on DN with multiple clients
> --
>
> Key: HDDS-2384
> URL: https://issues.apache.org/jira/browse/HDDS-2384
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Anu Engineer
>Priority: Major
>  Labels: performance
>
> During large file writes, it ends up writing {{16 MB}} chunks.  
> https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L691
> In large clusters, 100s of clients may connect to DN. In such cases, 
> depending on the incoming write workload mem load on DN can increase 
> significantly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967912#comment-16967912
 ] 

Hadoop QA commented on HDFS-14941:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
25s{color} | {color:green} root: The patch generated 0 new + 705 unchanged - 1 
fixed = 705 total (was 706) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m 48s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}101m 
22s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
52s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}241m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestFixKerberosTicketOrder |
|   | hadoop.conf.TestCommonConfigurationFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14941 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984985/HDFS-14941.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a61f79ae4988 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bfb8f28 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| 

[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14941:
--
Attachment: HDFS-14941.005.patch

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967905#comment-16967905
 ] 

Chen Liang commented on HDFS-14941:
---

Thanks for the catch [~shv], uploaded v05 patch.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-11-05 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967903#comment-16967903
 ] 

Erik Krogen commented on HDFS-14884:


[~weichiu] any chance you can review the branch-2 port? I am hoping to save 
myself the time of understanding what's going on here :) Let me know if you 
don't have the time.

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Affects Versions: 2.11.0
>Reporter: Mukul Kumar Singh
>Assignee: Yuval Degani
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.11.0
>
> Attachments: HDFS-14884-branch-2.001.patch, HDFS-14884.001.patch, 
> HDFS-14884.002.patch, HDFS-14884.003.patch, hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967888#comment-16967888
 ] 

Konstantin Shvachko edited comment on HDFS-14941 at 11/5/19 9:40 PM:
-

Small thing. Still one "cached" genstamp remaining, should be "impending".
{code}
   * Set the current genstamp to the impending genstamp.
{code}
Don't need Jenkins build for that, since it is a comment change only.


was (Author: shv):
Small thing. Still one "cached" genstamp remaining, should be "impending".
{code}
   * Set the current genstamp to the impending genstamp.
{code}

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967888#comment-16967888
 ] 

Konstantin Shvachko commented on HDFS-14941:


Small thing. Still one "cached" genstamp remaining, should be "impending".
{code}
   * Set the current genstamp to the impending genstamp.
{code}

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14940) HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum network bandwidth used by the datanode while network bandwidth set with values as 104857600

2019-11-05 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967877#comment-16967877
 ] 

Hadoop QA commented on HDFS-14940:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 13s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}146m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14940 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984989/HDFS-14940.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 71e197b546db 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bfb8f28 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28256/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28256/testReport/ |
| Max. process+thread count | 3396 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28256/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |



[jira] [Commented] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException

2019-11-05 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967873#comment-16967873
 ] 

Tsz-wo Sze commented on HDDS-2372:
--

It makes sense to check the chunk file again after temporary chunk file failure 
to avoid the problem here.  This solution is simple and no synchronization is 
need.

> Datanode pipeline is failing with NoSuchFileException
> -
>
> Key: HDDS-2372
> URL: https://issues.apache.org/jira/browse/HDDS-2372
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Assignee: Shashikant Banerjee
>Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14384) When lastLocatedBlock token expire, it will take 1~3s second to refetch it.

2019-11-05 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967865#comment-16967865
 ] 

Hadoop QA commented on HDFS-14384:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
11s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 46s{color} | {color:orange} hadoop-hdfs-project: The patch generated 2 new + 
54 unchanged - 0 fixed = 56 total (was 54) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
57s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 38s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}166m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.TestFileChecksumCompositeCrc |
|   | hadoop.hdfs.TestErasureCodingExerciseAPIs |
|   | hadoop.hdfs.server.datanode.TestDataNodeLifeline |
|   | hadoop.hdfs.TestEncryptedTransfer |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestDFSPermission |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14384 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982833/HDFS-14384.002.patch |
| Optional Tests |  dupname  asflicense  compile  

[jira] [Commented] (HDFS-14922) On StartUp , Snapshot modification time got changed

2019-11-05 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967843#comment-16967843
 ] 

Hadoop QA commented on HDFS-14922:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 49s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 623 unchanged - 1 fixed = 624 total (was 624) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 23s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14922 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984981/HDFS-14922.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 35f9990087a3 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bfb8f28 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28253/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28253/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Updated] (HDDS-2195) Apply spotbugs check to test code

2019-11-05 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2195:
---
Labels: newbie  (was: )

> Apply spotbugs check to test code
> -
>
> Key: HDDS-2195
> URL: https://issues.apache.org/jira/browse/HDDS-2195
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Priority: Major
>  Labels: newbie
>
> The goal of this task is to [enable Spotbugs to run on test 
> code|https://spotbugs.github.io/spotbugs-maven-plugin/spotbugs-mojo.html#includeTests],
>  and fix all issues it reports (both to improve code and to avoid breaking 
> CI).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2195) Apply spotbugs check to test code

2019-11-05 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai reassigned HDDS-2195:
--

Assignee: (was: Attila Doroszlai)

> Apply spotbugs check to test code
> -
>
> Key: HDDS-2195
> URL: https://issues.apache.org/jira/browse/HDDS-2195
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Priority: Major
>
> The goal of this task is to [enable Spotbugs to run on test 
> code|https://spotbugs.github.io/spotbugs-maven-plugin/spotbugs-mojo.html#includeTests],
>  and fix all issues it reports (both to improve code and to avoid breaking 
> CI).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2405) int2ByteString unnecessary byte array allocation

2019-11-05 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2405:
---
Description: 
{{int2ByteString}} implementations (currently duplicated in 
[RatisHelper|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java#L280-L289]
 and 
[Checksum|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L64-L73],
 but the first one is being removed in HDDS-2375) result in unnecessary byte 
array allocations:

# {{ByteString.Output}} creates 128-byte buffer by default, which is too large 
for writing a single int
# {{DataOutputStream}} allocates an [extra 8-byte 
array|https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/io/DataOutputStream.java#l204],
 used only for writing longs
# {{ByteString.Output}} also creates 10-element array for {{flushedBuffers}}

  was:
{{int2ByteString}} implementations (currently duplicated in 
[RatisHelper|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java#L280-L289]
 and 
[Checksum|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L64-L73],
 but the first one is being removed in HDDS-2375) result in unnecessary byte 
array allocations:

# {{ByteString.Output}} creates 128-byte buffer by default, which is too large 
for writing a single int
# {{DataOutputStream}} allocates an [extra 8-byte 
array|https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/io/DataOutputStream.java#l204],
 used only for writing longs


> int2ByteString unnecessary byte array allocation
> 
>
> Key: HDDS-2405
> URL: https://issues.apache.org/jira/browse/HDDS-2405
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>
> {{int2ByteString}} implementations (currently duplicated in 
> [RatisHelper|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java#L280-L289]
>  and 
> [Checksum|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L64-L73],
>  but the first one is being removed in HDDS-2375) result in unnecessary byte 
> array allocations:
> # {{ByteString.Output}} creates 128-byte buffer by default, which is too 
> large for writing a single int
> # {{DataOutputStream}} allocates an [extra 8-byte 
> array|https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/io/DataOutputStream.java#l204],
>  used only for writing longs
> # {{ByteString.Output}} also creates 10-element array for {{flushedBuffers}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14929) Hadoop 2.9.1 rename functionality infrequent breaking

2019-11-05 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967797#comment-16967797
 ] 

Wei-Chiu Chuang commented on HDFS-14929:


is this a duplicate of HDFS-14947?

> Hadoop 2.9.1 rename functionality infrequent breaking 
> --
>
> Key: HDFS-14929
> URL: https://issues.apache.org/jira/browse/HDFS-14929
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.1
>Reporter: abhishek sahani
>Priority: Major
>
> We are infrequently seeing rename functionality not working properly , in 
> logs it seems file is rename is success but in ui and when listing the files 
> using ./hdfs dfs -ls-R file is not present .
>  DEBUG hdfs.StateChange: *DIR* NameNode.rename: 
> /topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet
>  to 
> /topics/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic+10+00+99.parquet19/10/23
> 19:06:41 DEBUG hdfs.StateChange: *DIR* NameNode.rename: 
> /topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet
>  to 
> /topics/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic+10+00+99.parquet
> 19/10/23 19:06:41DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: 
> /topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet
>  to 
> /topics/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic+10+00+99.parquet19/10/23
>  
> 19:06:41DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: 
> /topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet
>  to 
> /topics/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic+10+00+99.parquet19/10/23
>  
> 19:06:41 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: 
> /topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet
>  is renamed to 
> /topics/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic+10+00+99.parquet19/10/23
>  
> 19:06:41 DEBUG namenode.FSEditLog: logEdit [RpcEdit op:RenameOldOp [length=0, 
> src=/topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet,
>  
> 

[jira] [Commented] (HDFS-14947) infrequent data loss due to rename functionality breaking

2019-11-05 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967794#comment-16967794
 ] 

Wei-Chiu Chuang commented on HDFS-14947:


Hey [~abhishek.sahani] thanks for the details.

bq. Firstly the connector task creates a temporary file for partition assigned 
to it  in hdfs inmemory file system and later after certain rotation time 
temporary file is closed and persisted to filesystem and later the temp file is 
also renamed in hdfs.
In other words, is RAM_DISK / LAZY_PERSIST used 
(https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/MemoryStorage.html)?
That can be an issue. Both Cloudera and Hortonworks don't support this feature 
officially and I know it's not as robust as it should be. Still, a file goes 
missing without a reason doesn't make sense to me.

> infrequent data loss due to rename functionality breaking
> -
>
> Key: HDFS-14947
> URL: https://issues.apache.org/jira/browse/HDFS-14947
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.3
>Reporter: abhishek sahani
>Priority: Critical
>
> We are facing an issue where data is getting lost from hdfs during rename , 
> in namenode logs we check file is renamed successfully but in hdfs after 
> rename file is not present at destination location and thus we are loosing 
> the data.
>  
> namenode logs:
> 19/10/31 16:54:09 DEBUG top.TopAuditLogger: --- logged event 
> for top service: allowed=true ugi=root (auth:SIMPLE) ip=/*.*.*.* cmd=rename 
> src=/topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  
> dst=/topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  perm=root:supergroup:rw-r--r--
>  
> 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: 
> /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  to 
> /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 8 on 9000: responding 
> to org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 
> *.*.*.*:39854 Call#48333 Retry#0
>  19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: 
> /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  to 
> /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 6 on 9000: 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from *.*.*.*:39854 
> Call#48337 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER
>  19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedRenameTo: 
> /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  is renamed to 
> 

[jira] [Updated] (HDFS-14940) HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum network bandwidth used by the datanode while network bandwidth set with values as 1048576000g

2019-11-05 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14940:
-
Attachment: HDFS-14940.001.patch
Status: Patch Available  (was: Open)

> HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum 
> network bandwidth used by the datanode while network bandwidth set with 
> values as 1048576000g/1048p/1e
> ---
>
> Key: HDFS-14940
> URL: https://issues.apache.org/jira/browse/HDFS-14940
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.1.1
> Environment: 3 Node HA Setup
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: BalancerBW.PNG, HDFS-14940.001.patch
>
>
> HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum 
> network bandwidth used by the datanode
>  while network bandwidth set with values as 1048576000g/1048p/1e
> Steps :-        
>  * Set balancer bandwith with command setBalancerBandwidth and vlaues as 
> [1048576000g/1048p/1e]
>  * - Check bandwidth used by the datanode during HDFS block balancing with 
> command :hdfs dfsadmin -getBalancerBandwidth "    check it will display some 
> different values not the same value as set



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14940) HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum network bandwidth used by the datanode while network bandwidth set with values as 104857600

2019-11-05 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967772#comment-16967772
 ] 

hemanthboyina commented on HDFS-14940:
--

thanks for the suggestion [~kihwal]

have added a constant for maximum bandwidth per datanode as 1TB/sec and made an 
upper bound check for bandwidth .

attached patch , please review .

> HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum 
> network bandwidth used by the datanode while network bandwidth set with 
> values as 1048576000g/1048p/1e
> ---
>
> Key: HDFS-14940
> URL: https://issues.apache.org/jira/browse/HDFS-14940
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 3.1.1
> Environment: 3 Node HA Setup
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: BalancerBW.PNG
>
>
> HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum 
> network bandwidth used by the datanode
>  while network bandwidth set with values as 1048576000g/1048p/1e
> Steps :-        
>  * Set balancer bandwith with command setBalancerBandwidth and vlaues as 
> [1048576000g/1048p/1e]
>  * - Check bandwidth used by the datanode during HDFS block balancing with 
> command :hdfs dfsadmin -getBalancerBandwidth "    check it will display some 
> different values not the same value as set



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14947) infrequent data loss due to rename functionality breaking

2019-11-05 Thread abhishek sahani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

abhishek sahani updated HDFS-14947:
---
Priority: Blocker  (was: Critical)

> infrequent data loss due to rename functionality breaking
> -
>
> Key: HDFS-14947
> URL: https://issues.apache.org/jira/browse/HDFS-14947
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.3
>Reporter: abhishek sahani
>Priority: Blocker
>
> We are facing an issue where data is getting lost from hdfs during rename , 
> in namenode logs we check file is renamed successfully but in hdfs after 
> rename file is not present at destination location and thus we are loosing 
> the data.
>  
> namenode logs:
> 19/10/31 16:54:09 DEBUG top.TopAuditLogger: --- logged event 
> for top service: allowed=true ugi=root (auth:SIMPLE) ip=/*.*.*.* cmd=rename 
> src=/topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  
> dst=/topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  perm=root:supergroup:rw-r--r--
>  
> 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: 
> /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  to 
> /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 8 on 9000: responding 
> to org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 
> *.*.*.*:39854 Call#48333 Retry#0
>  19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: 
> /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  to 
> /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 6 on 9000: 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from *.*.*.*:39854 
> Call#48337 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER
>  19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedRenameTo: 
> /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  is renamed to 
> /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14947) infrequent data loss due to rename functionality breaking

2019-11-05 Thread abhishek sahani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

abhishek sahani updated HDFS-14947:
---
Priority: Critical  (was: Blocker)

> infrequent data loss due to rename functionality breaking
> -
>
> Key: HDFS-14947
> URL: https://issues.apache.org/jira/browse/HDFS-14947
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.3
>Reporter: abhishek sahani
>Priority: Critical
>
> We are facing an issue where data is getting lost from hdfs during rename , 
> in namenode logs we check file is renamed successfully but in hdfs after 
> rename file is not present at destination location and thus we are loosing 
> the data.
>  
> namenode logs:
> 19/10/31 16:54:09 DEBUG top.TopAuditLogger: --- logged event 
> for top service: allowed=true ugi=root (auth:SIMPLE) ip=/*.*.*.* cmd=rename 
> src=/topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  
> dst=/topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  perm=root:supergroup:rw-r--r--
>  
> 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: 
> /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  to 
> /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 8 on 9000: responding 
> to org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 
> *.*.*.*:39854 Call#48333 Retry#0
>  19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: 
> /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  to 
> /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 6 on 9000: 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from *.*.*.*:39854 
> Call#48337 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER
>  19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedRenameTo: 
> /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet
>  is renamed to 
> /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14806) Bootstrap standby may fail if used in-progress tailing

2019-11-05 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967747#comment-16967747
 ] 

Chen Liang commented on HDFS-14806:
---

The remaining javadoc warnings are not introduced by this patch, and the failed 
tests all passed in my local run.

> Bootstrap standby may fail if used in-progress tailing
> --
>
> Key: HDFS-14806
> URL: https://issues.apache.org/jira/browse/HDFS-14806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14806.001.patch, HDFS-14806.002.patch, 
> HDFS-14806.003.patch, HDFS-14806.004.patch
>
>
> One issue we went across was that if in-progress tailing is enabled, 
> bootstrap standby could fail.
> When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get 
> edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an 
> upper bound on how many txnid can be included in one RPC call. The default is 
> 5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from 
> JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's 
> current transactionID, NN2 may return a state that is > 5000 txnid from NN1's 
> current image. But NN1 can only see 5000 more txnid from JNs. At this point 
> NN1 goes panic, because txnid retuned by JNs is behind NN2's returned state, 
> bootstrap then fail.
> Essentially, bootstrap standby can fail if both of two following conditions 
> are met:
>  # in-progress tailing is enabled AND
>  # the boostraping NN is too far (>5000 txid)  behind 
> Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super 
> large value allowed bootstrap to continue. But this is hardly the ideal 
> solution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2321) Ozone Block Token verify should not apply to all datanode cmd

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2321?focusedWorklogId=338910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338910
 ]

ASF GitHub Bot logged work on HDDS-2321:


Author: ASF GitHub Bot
Created on: 05/Nov/19 18:19
Start Date: 05/Nov/19 18:19
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #110: HDDS-2321. 
Ozone Block Token verify should not apply to all datanode …
URL: https://github.com/apache/hadoop-ozone/pull/110
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338910)
Time Spent: 20m  (was: 10m)

> Ozone Block Token verify should not apply to all datanode cmd
> -
>
> Key: HDDS-2321
> URL: https://issues.apache.org/jira/browse/HDDS-2321
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.1
>Reporter: Nilotpal Nandi
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DN container protocol has cmd send from SCM or other DN, which do not bear OM 
> block token like OM client. We should restrict the OM Block token check only 
> for those issued from OM client. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2321) Ozone Block Token verify should not apply to all datanode cmd

2019-11-05 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2321:
-
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks all for the reviews. I've merged the PR to master. 

> Ozone Block Token verify should not apply to all datanode cmd
> -
>
> Key: HDDS-2321
> URL: https://issues.apache.org/jira/browse/HDDS-2321
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.1
>Reporter: Nilotpal Nandi
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DN container protocol has cmd send from SCM or other DN, which do not bear OM 
> block token like OM client. We should restrict the OM Block token check only 
> for those issued from OM client. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14941:
--
Attachment: HDFS-14941.004.patch

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-11-05 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967735#comment-16967735
 ] 

Chen Liang commented on HDFS-14941:
---

v004 patch to fix checkstyle warnings.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
> Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, 
> HDFS-14941.003.patch, HDFS-14941.004.patch
>
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log

2019-11-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967713#comment-16967713
 ] 

Hudson commented on HDFS-14775:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17609 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17609/])
HDFS-14775. Add Timestamp for longest FSN write/read lock held log. (inigoiri: 
rev bfb8f28cc995241e7387ceba8e14791b8c121956)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystemLock.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystemLock.java


> Add Timestamp for longest FSN write/read lock held log
> --
>
> Key: HDFS-14775
> URL: https://issues.apache.org/jira/browse/HDFS-14775
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, 
> HDFS-14775.003.patch, HDFS-14775.004.patch, HDFS-14775.005.patch
>
>
> HDFS-13946 improved the log for longest read/write lock held time, it's very 
> useful improvement.
> In some condition, we need to locate the detailed call information(user, ip, 
> path, etc.) for longest lock holder, but the default throttle interval(10s) 
> is too long to find the corresponding audit log. I think we should add the 
> timestamp for the {{longestWriteLockHeldStackTrace}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14949) HttpFS does not support getServerDefaults()

2019-11-05 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967711#comment-16967711
 ] 

hemanthboyina commented on HDFS-14949:
--

Have gone through some  filesystem api's

getServerDefaults() was deprecated and getServerDefaults(Path) calls again the 
getServerDefaults()

i think we need to add getServerDefaults(Path p) .

please correct me if am wrong .

your suggestions [~elgoiri] [~kihwal] ?

> HttpFS does not support getServerDefaults()
> ---
>
> Key: HDFS-14949
> URL: https://issues.apache.org/jira/browse/HDFS-14949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14949.001.patch, HDFS-14949.002.patch, 
> HDFS-14949.003.patch
>
>
> For HttpFS server to function as a fully webhdfs-compatible service, 
> getServerDefaults() support is needed.  It is increasingly used in new 
> features and improvements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14922) On StartUp , Snapshot modification time got changed

2019-11-05 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14922:
-
Attachment: HDFS-14922.004.patch

> On StartUp , Snapshot modification time got changed
> ---
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, 
> HDFS-14922.003.patch, HDFS-14922.004.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.

2019-11-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967702#comment-16967702
 ] 

Íñigo Goiri commented on HDFS-14928:


Yes, please, go ahead with #1.

> UI: unifying the WebUI across different components.
> ---
>
> Key: HDFS-14928
> URL: https://issues.apache.org/jira/browse/HDFS-14928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Reporter: Xieming Li
>Priority: Trivial
> Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, 
> NN_orig.png, NN_with_legend.png, NN_wo_legend.png, RBF_orig.png, 
> RBF_with_legend.png, RBF_wo_legend.png
>
>
> The WebUI of different components could be unified.
> *Router:*
> |Current|  !RBF_orig.png|width=500! | 
> |Proposed 1 (With Icon) |  !RBF_wo_legend.png|width=500! | 
> |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500!  | 
> *NameNode:*
> |Current| !NN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! |
> *DataNode:*
> |Current| !DN_orig.png|width=500! |
> |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! |
> |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log

2019-11-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967691#comment-16967691
 ] 

Íñigo Goiri commented on HDFS-14775:


Thanks [~zhangchen] for the patch and [~xkrogen] and [~hexiaoqiao] for the 
reviews.
Committed to trunk.

> Add Timestamp for longest FSN write/read lock held log
> --
>
> Key: HDFS-14775
> URL: https://issues.apache.org/jira/browse/HDFS-14775
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, 
> HDFS-14775.003.patch, HDFS-14775.004.patch, HDFS-14775.005.patch
>
>
> HDFS-13946 improved the log for longest read/write lock held time, it's very 
> useful improvement.
> In some condition, we need to locate the detailed call information(user, ip, 
> path, etc.) for longest lock holder, but the default throttle interval(10s) 
> is too long to find the corresponding audit log. I think we should add the 
> timestamp for the {{longestWriteLockHeldStackTrace}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log

2019-11-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14775:
---
Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Add Timestamp for longest FSN write/read lock held log
> --
>
> Key: HDFS-14775
> URL: https://issues.apache.org/jira/browse/HDFS-14775
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, 
> HDFS-14775.003.patch, HDFS-14775.004.patch, HDFS-14775.005.patch
>
>
> HDFS-13946 improved the log for longest read/write lock held time, it's very 
> useful improvement.
> In some condition, we need to locate the detailed call information(user, ip, 
> path, etc.) for longest lock holder, but the default throttle interval(10s) 
> is too long to find the corresponding audit log. I think we should add the 
> timestamp for the {{longestWriteLockHeldStackTrace}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-4935) add symlink support to HttpFS server side

2019-11-05 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-4935:

Attachment: HDFS-4935.001.patch

> add symlink support to HttpFS server side
> -
>
> Key: HDFS-4935
> URL: https://issues.apache.org/jira/browse/HDFS-4935
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: followup on HADOOP-8040
>Reporter: Alejandro Abdelnur
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-4935.001.patch
>
>
> follow up on HADOOP-8040



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet

2019-11-05 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2270:
---
Status: Patch Available  (was: In Progress)

> Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
> --
>
> Key: HDDS-2270
> URL: https://issues.apache.org/jira/browse/HDDS-2270
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ContainerStateMachine:
> - In loadSnapshot(..), it first reads the snapshotFile to a  byte[] and then 
> parses it to ContainerProtos.Container2BCSIDMapProto.  The buffer copying can 
> be avoided.
> {code}
> try (FileInputStream fin = new FileInputStream(snapshotFile)) {
>   byte[] container2BCSIDData = IOUtils.toByteArray(fin);
>   ContainerProtos.Container2BCSIDMapProto proto =
>   ContainerProtos.Container2BCSIDMapProto
>   .parseFrom(container2BCSIDData);
>   ...
> }
> {code}
> - persistContainerSet(..) has similar problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2064?focusedWorklogId=338829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338829
 ]

ASF GitHub Bot logged work on HDDS-2064:


Author: ASF GitHub Bot
Created on: 05/Nov/19 16:06
Start Date: 05/Nov/19 16:06
Worklog Time Spent: 10m 
  Work Description: smengcl commented on pull request #119: HDDS-2064. Add 
tests for incorrect OM HA config when node ID or RPC address is not configured
URL: https://github.com/apache/hadoop-ozone/pull/119
 
 
   ## What changes were proposed in this pull request?
   
   Add two unit tests for HDDS-2162, when OM service ID is specified:
   (1) Cluster should fail to start if a list of OM Node IDs is **not** 
specified;
   (2) Cluster should fail to start if a list of OM Node IDs is specified, but 
OM RPC address is **not** specified.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2064
   
   ## How was this patch tested?
   
   Run the two newly added unit tests in this patch.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338829)
Time Spent: 2h  (was: 1h 50m)

> Add tests for incorrect OM HA config when node ID or RPC address is not 
> configured
> --
>
> Key: HDDS-2064
> URL: https://issues.apache.org/jira/browse/HDDS-2064
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
> `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't 
> exist.-
> -Root cause:-
> -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. 
> This happens when local OM doesn't match any `ozone.om.address.idX.omX` in 
> the config.-
> Due to the refactoring done in HDDS-2162. This fix has been included in that 
> commit. I will repurpose the jira to add some tests for the HA config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2064?focusedWorklogId=338822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338822
 ]

ASF GitHub Bot logged work on HDDS-2064:


Author: ASF GitHub Bot
Created on: 05/Nov/19 16:04
Start Date: 05/Nov/19 16:04
Worklog Time Spent: 10m 
  Work Description: smengcl commented on issue #1398: HDDS-2064. 
OzoneManagerRatisServer#newOMRatisServer throws NPE when OM HA is configured 
incorrectly
URL: https://github.com/apache/hadoop/pull/1398#issuecomment-549885489
 
 
   Due to the refactoring done in HDDS-2162. This fix has been included in that 
commit. I will repurpose the jira to add a unit test for the HA config. I'm 
closing this PR. Will open another one in hadoop-ozone repo.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338822)
Time Spent: 1h 40m  (was: 1.5h)

> Add tests for incorrect OM HA config when node ID or RPC address is not 
> configured
> --
>
> Key: HDDS-2064
> URL: https://issues.apache.org/jira/browse/HDDS-2064
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
> `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't 
> exist.-
> -Root cause:-
> -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. 
> This happens when local OM doesn't match any `ozone.om.address.idX.omX` in 
> the config.-
> Due to the refactoring done in HDDS-2162. This fix has been included in that 
> commit. I will repurpose the jira to add some tests for the HA config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2064?focusedWorklogId=338823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338823
 ]

ASF GitHub Bot logged work on HDDS-2064:


Author: ASF GitHub Bot
Created on: 05/Nov/19 16:04
Start Date: 05/Nov/19 16:04
Worklog Time Spent: 10m 
  Work Description: smengcl commented on pull request #1398: HDDS-2064. 
OzoneManagerRatisServer#newOMRatisServer throws NPE when OM HA is configured 
incorrectly
URL: https://github.com/apache/hadoop/pull/1398
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338823)
Time Spent: 1h 50m  (was: 1h 40m)

> Add tests for incorrect OM HA config when node ID or RPC address is not 
> configured
> --
>
> Key: HDDS-2064
> URL: https://issues.apache.org/jira/browse/HDDS-2064
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
> `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't 
> exist.-
> -Root cause:-
> -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. 
> This happens when local OM doesn't match any `ozone.om.address.idX.omX` in 
> the config.-
> Due to the refactoring done in HDDS-2162. This fix has been included in that 
> commit. I will repurpose the jira to add some tests for the HA config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured

2019-11-05 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-2064:
-
Summary: Add tests for incorrect OM HA config when node ID or RPC address 
is not configured  (was: OzoneManagerRatisServer#newOMRatisServer throws NPE 
when OM HA is configured incorrectly)

> Add tests for incorrect OM HA config when node ID or RPC address is not 
> configured
> --
>
> Key: HDDS-2064
> URL: https://issues.apache.org/jira/browse/HDDS-2064
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
> `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't 
> exist.
> Root cause:
> `OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. 
> This happens when local OM doesn't match any `ozone.om.address.idX.omX` in 
> the config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured

2019-11-05 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-2064:
-
Description: 
-OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
`ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't 
exist.-

-Root cause:-
-`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. 
This happens when local OM doesn't match any `ozone.om.address.idX.omX` in the 
config.-

Due to the refactoring done in HDDS-2162. This fix has been included in that 
commit. I will repurpose the jira to add some tests for the HA config.

  was:
-OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
`ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't exist.

Root cause:
`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. This 
happens when local OM doesn't match any `ozone.om.address.idX.omX` in the 
config.
-

Due to the refactoring done in HDDS-2162. This fix has been included in that 
commit. I will repurpose the jira to add some tests for the HA config.


> Add tests for incorrect OM HA config when node ID or RPC address is not 
> configured
> --
>
> Key: HDDS-2064
> URL: https://issues.apache.org/jira/browse/HDDS-2064
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
> `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't 
> exist.-
> -Root cause:-
> -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. 
> This happens when local OM doesn't match any `ozone.om.address.idX.omX` in 
> the config.-
> Due to the refactoring done in HDDS-2162. This fix has been included in that 
> commit. I will repurpose the jira to add some tests for the HA config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured

2019-11-05 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-2064:
-
Description: 
-OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
`ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't exist.

Root cause:
`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. This 
happens when local OM doesn't match any `ozone.om.address.idX.omX` in the 
config.
-

Due to the refactoring done in HDDS-2162. This fix has been included in that 
commit. I will repurpose the jira to add some tests for the HA config.

  was:
OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
`ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't exist.

Root cause:
`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. This 
happens when local OM doesn't match any `ozone.om.address.idX.omX` in the 
config.



> Add tests for incorrect OM HA config when node ID or RPC address is not 
> configured
> --
>
> Key: HDDS-2064
> URL: https://issues.apache.org/jira/browse/HDDS-2064
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
> `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't 
> exist.
> Root cause:
> `OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. 
> This happens when local OM doesn't match any `ozone.om.address.idX.omX` in 
> the config.
> -
> Due to the refactoring done in HDDS-2162. This fix has been included in that 
> commit. I will repurpose the jira to add some tests for the HA config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log

2019-11-05 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967617#comment-16967617
 ] 

Erik Krogen commented on HDFS-14775:


+1 thanks [~zhangchen]!

> Add Timestamp for longest FSN write/read lock held log
> --
>
> Key: HDFS-14775
> URL: https://issues.apache.org/jira/browse/HDFS-14775
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, 
> HDFS-14775.003.patch, HDFS-14775.004.patch, HDFS-14775.005.patch
>
>
> HDFS-13946 improved the log for longest read/write lock held time, it's very 
> useful improvement.
> In some condition, we need to locate the detailed call information(user, ip, 
> path, etc.) for longest lock holder, but the default throttle interval(10s) 
> is too long to find the corresponding audit log. I think we should add the 
> timestamp for the {{longestWriteLockHeldStackTrace}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-1987) Fix listStatus API

2019-11-05 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-1987 started by Siyao Meng.

> Fix listStatus API
> --
>
> Key: HDDS-1987
> URL: https://issues.apache.org/jira/browse/HDDS-1987
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira is to fix listStatus API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response. It will be picked by double buffer thread and 
> flushed to disk later. So when user call listStatus, it should use both 
> in-memory cache and rocksdb key table to return the correct result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2270:
-
Labels: pull-request-available  (was: )

> Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
> --
>
> Key: HDDS-2270
> URL: https://issues.apache.org/jira/browse/HDDS-2270
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> ContainerStateMachine:
> - In loadSnapshot(..), it first reads the snapshotFile to a  byte[] and then 
> parses it to ContainerProtos.Container2BCSIDMapProto.  The buffer copying can 
> be avoided.
> {code}
> try (FileInputStream fin = new FileInputStream(snapshotFile)) {
>   byte[] container2BCSIDData = IOUtils.toByteArray(fin);
>   ContainerProtos.Container2BCSIDMapProto proto =
>   ContainerProtos.Container2BCSIDMapProto
>   .parseFrom(container2BCSIDData);
>   ...
> }
> {code}
> - persistContainerSet(..) has similar problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2270?focusedWorklogId=338720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338720
 ]

ASF GitHub Bot logged work on HDDS-2270:


Author: ASF GitHub Bot
Created on: 05/Nov/19 13:43
Start Date: 05/Nov/19 13:43
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #118: HDDS-2270. 
Avoid buffer copying in ContainerStateMachine
URL: https://github.com/apache/hadoop-ozone/pull/118
 
 
   ## What changes were proposed in this pull request?
   
   Eliminate temporary `byte[]` buffer in `ContainerStateMachine` 
(`loadSnapshot` and `persistContainerSet`).
   
   https://issues.apache.org/jira/browse/HDDS-2270
   
   ## How was this patch tested?
   
   Verified on a docker-compose cluster that datanode writes/reads the snapshot 
info successfully.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338720)
Remaining Estimate: 0h
Time Spent: 10m

> Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
> --
>
> Key: HDDS-2270
> URL: https://issues.apache.org/jira/browse/HDDS-2270
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ContainerStateMachine:
> - In loadSnapshot(..), it first reads the snapshotFile to a  byte[] and then 
> parses it to ContainerProtos.Container2BCSIDMapProto.  The buffer copying can 
> be avoided.
> {code}
> try (FileInputStream fin = new FileInputStream(snapshotFile)) {
>   byte[] container2BCSIDData = IOUtils.toByteArray(fin);
>   ContainerProtos.Container2BCSIDMapProto proto =
>   ContainerProtos.Container2BCSIDMapProto
>   .parseFrom(container2BCSIDData);
>   ...
> }
> {code}
> - persistContainerSet(..) has similar problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14643) [Dynamometer] Merge extra commits from GitHub to Hadoop

2019-11-05 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967520#comment-16967520
 ] 

Takanobu Asanuma edited comment on HDFS-14643 at 11/5/19 1:22 PM:
--

Some of them have been already merged.
* HDFS-14817: [PR #70|https://github.com/linkedin/dynamometer/pull/70]
* HDFS-14824: [PR #76|https://github.com/linkedin/dynamometer/pull/76], [PR 
#92|https://github.com/linkedin/dynamometer/pull/92], [PR 
#96|https://github.com/linkedin/dynamometer/pull/96]
* HDFS-14825: [PR #84|https://github.com/linkedin/dynamometer/pull/84]


was (Author: tasanuma0829):
Some of them have been already merged.
* HDFS-14817: [PR #90|https://github.com/linkedin/dynamometer/pull/90]
* HDFS-14824: [PR #76|https://github.com/linkedin/dynamometer/pull/76], [PR 
#92|https://github.com/linkedin/dynamometer/pull/92], [PR 
#96|https://github.com/linkedin/dynamometer/pull/96]
* HDFS-14825: [PR #84|https://github.com/linkedin/dynamometer/pull/84]

> [Dynamometer] Merge extra commits from GitHub to Hadoop
> ---
>
> Key: HDFS-14643
> URL: https://issues.apache.org/jira/browse/HDFS-14643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> While Dynamometer was in the process of being committed to Hadoop, a few 
> patches went in to the GitHub version that haven't yet made it into the 
> version committed here. Some of them are related to TravisCI and Bintray 
> deployment, which can safely be ignored in a Hadoop context, but a few are 
> relevant:
> {code}
> * 2d2591e 2019-05-24 Make XML parsing error message more explicit (PR #97) 
> [lfengnan ]
> * 755a298 2019-04-04 Fix misimplemented CountTimeWritable setter and update 
> the README docs regarding the output file (PR #96) [Christopher Gregorian 
> ]
> * 66d3e19 2019-03-14 Modify AuditReplay workflow to output count and latency 
> of operations (PR #92) [Christopher Gregorian ]
> * 5c1d8cd 2019-02-28 Fix issues with the start-workload.sh script (PR #84) 
> [Erik Krogen ]
> {code}
> I will use this ticket to track porting these 4 commits into Hadoop's 
> Dynamometer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14643) [Dynamometer] Merge extra commits from GitHub to Hadoop

2019-11-05 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967520#comment-16967520
 ] 

Takanobu Asanuma commented on HDFS-14643:
-

Some of them have been already merged.
* HDFS-14817: [PR #90|https://github.com/linkedin/dynamometer/pull/90]
* HDFS-14824: [PR #76|https://github.com/linkedin/dynamometer/pull/76], [PR 
#92|https://github.com/linkedin/dynamometer/pull/92], [PR 
#96|https://github.com/linkedin/dynamometer/pull/96]
* HDFS-14825: [PR #84|https://github.com/linkedin/dynamometer/pull/84]

> [Dynamometer] Merge extra commits from GitHub to Hadoop
> ---
>
> Key: HDFS-14643
> URL: https://issues.apache.org/jira/browse/HDFS-14643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> While Dynamometer was in the process of being committed to Hadoop, a few 
> patches went in to the GitHub version that haven't yet made it into the 
> version committed here. Some of them are related to TravisCI and Bintray 
> deployment, which can safely be ignored in a Hadoop context, but a few are 
> relevant:
> {code}
> * 2d2591e 2019-05-24 Make XML parsing error message more explicit (PR #97) 
> [lfengnan ]
> * 755a298 2019-04-04 Fix misimplemented CountTimeWritable setter and update 
> the README docs regarding the output file (PR #96) [Christopher Gregorian 
> ]
> * 66d3e19 2019-03-14 Modify AuditReplay workflow to output count and latency 
> of operations (PR #92) [Christopher Gregorian ]
> * 5c1d8cd 2019-02-28 Fix issues with the start-workload.sh script (PR #84) 
> [Erik Krogen ]
> {code}
> I will use this ticket to track porting these 4 commits into Hadoop's 
> Dynamometer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snapshot and trash feature enabled for a directory

2019-11-05 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967474#comment-16967474
 ] 

Surendra Singh Lilhore commented on HDFS-14499:
---

I feel this fix is wrong, {{INodeReference}} are created for snapshots. If 
space quota for \{{INodeReference}} is calculated then snapshot diff also 
should be conside.
{code:java}
return referred.computeContentSummary(id, summary);{code}
This call just count the current Inode size, not the FileDiff size. Please 
correct me, If I am wrong...

 

> Misleading REM_QUOTA value with snapshot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, 
> HDFS-14499.002.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet

2019-11-05 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2270 started by Attila Doroszlai.
--
> Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
> --
>
> Key: HDDS-2270
> URL: https://issues.apache.org/jira/browse/HDDS-2270
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>
> ContainerStateMachine:
> - In loadSnapshot(..), it first reads the snapshotFile to a  byte[] and then 
> parses it to ContainerProtos.Container2BCSIDMapProto.  The buffer copying can 
> be avoided.
> {code}
> try (FileInputStream fin = new FileInputStream(snapshotFile)) {
>   byte[] container2BCSIDData = IOUtils.toByteArray(fin);
>   ContainerProtos.Container2BCSIDMapProto proto =
>   ContainerProtos.Container2BCSIDMapProto
>   .parseFrom(container2BCSIDData);
>   ...
> }
> {code}
> - persistContainerSet(..) has similar problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2406) ozone shell key get throws IllegalArgumentException if pipeline is empty

2019-11-05 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-2406:
--

 Summary: ozone shell key get throws IllegalArgumentException if 
pipeline is empty
 Key: HDDS-2406
 URL: https://issues.apache.org/jira/browse/HDDS-2406
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone CLI
Reporter: Attila Doroszlai


{{ozone shell key get}} throws when trying to get a key from a pipeline whose 
datanodes are all down:

{code}
java.lang.IllegalArgumentException
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:169)
at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:162)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:224)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:173)
at 
org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
at java.base/java.io.InputStream.read(InputStream.java:205)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94)
at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:98)
at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48)
at picocli.CommandLine.execute(CommandLine.java:1173)
at picocli.CommandLine.access$800(CommandLine.java:141)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
at 
org.apache.hadoop.ozone.web.ozShell.OzoneShell.execute(OzoneShell.java:60)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
at 
org.apache.hadoop.ozone.web.ozShell.OzoneShell.main(OzoneShell.java:53)
{code}

I think the exception should be caught and shell should output a more friendly 
/ less verbose message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14902) RBF: NullPointer When Misconfigured

2019-11-05 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967342#comment-16967342
 ] 

Akira Ajisaka commented on HDFS-14902:
--

Compiled with the patch and run {{./hdfs dfsrouter}} command.
{noformat}
2019-11-05 17:35:50,276 ERROR router.NamenodeHeartbeatService: Namenode is not 
operational: Namenode is unregistered
{noformat}

This error message seems confusing to me. It seems to me NameNode is not just 
running or is in safe mode. I think the error message is shown if and only if 
the DFSRouter is misconfigured, so it's better to include that the 
configuration is wrong.

> RBF: NullPointer When Misconfigured
> ---
>
> Key: HDFS-14902
> URL: https://issues.apache.org/jira/browse/HDFS-14902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: Takanobu Asanuma
>Priority: Minor
> Attachments: HDFS-14902.001.patch
>
>
> Admittedly the server was mis-configured, but this should be a bit more 
> elegant.
> {code:none}
> 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled 
> exception updating NN registration for null:null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108)
>   at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org