[jira] [Updated] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

2019-11-11 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-13811:
---
Attachment: HDFS-13811.003.patch

> RBF: Race condition between router admin quota update and periodic quota 
> update service
> ---
>
> Key: HDFS-13811
> URL: https://issues.apache.org/jira/browse/HDFS-13811
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch, 
> HDFS-13811.001.patch, HDFS-13811.002.patch, HDFS-13811.003.patch
>
>
> If we try to update quota of an existing mount entry and at the same time 
> periodic quota update service is running on the same mount entry, it is 
> leading the mount table to _inconsistent state._
> Here transactions are:
> A - Quota update service is fetching mount table entries.
> B - Quota update service is updating the mount table with current usage.
> A' - User is trying to update quota using admin cmd.
> and the transaction sequence is [ A A' B ]
> quota update service is updating the mount table with old quota value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

2019-11-11 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972153#comment-16972153
 ] 

Jinglun commented on HDFS-13811:


Upload v03 fixing findbugs error. Hi [~ayushtkn] would you help to review v03 ? 
Thanks !

> RBF: Race condition between router admin quota update and periodic quota 
> update service
> ---
>
> Key: HDFS-13811
> URL: https://issues.apache.org/jira/browse/HDFS-13811
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch, 
> HDFS-13811.001.patch, HDFS-13811.002.patch
>
>
> If we try to update quota of an existing mount entry and at the same time 
> periodic quota update service is running on the same mount entry, it is 
> leading the mount table to _inconsistent state._
> Here transactions are:
> A - Quota update service is fetching mount table entries.
> B - Quota update service is updating the mount table with current usage.
> A' - User is trying to update quota using admin cmd.
> and the transaction sequence is [ A A' B ]
> quota update service is updating the mount table with old quota value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14955) RBF: getQuotaUsage() on mount point should return global quota.

2019-11-11 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-14955:
---
Attachment: HDFS-14955.001.patch
Status: Patch Available  (was: Open)

> RBF: getQuotaUsage() on mount point should return global quota.
> ---
>
> Key: HDFS-14955
> URL: https://issues.apache.org/jira/browse/HDFS-14955
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14955.001.patch
>
>
> When getQuotaUsage() on a mount point path, the quota part should be the 
> global quota. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14955) RBF: getQuotaUsage() on mount point should return global quota.

2019-11-11 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972146#comment-16972146
 ] 

Jinglun commented on HDFS-14955:


Upload v01. Hi [~ayushtkn] [~elgoiri], would you help to review it, thanks !

> RBF: getQuotaUsage() on mount point should return global quota.
> ---
>
> Key: HDFS-14955
> URL: https://issues.apache.org/jira/browse/HDFS-14955
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
>
> When getQuotaUsage() on a mount point path, the quota part should be the 
> global quota. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-11 Thread Nanda kumar (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972141#comment-16972141
 ] 

Nanda kumar commented on HDDS-2446:
---

I'm just worried that we might run into consistency issue later on if we 
maintain transient states in multiple places, even if they point to the same 
object.

> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> 
>
> Key: HDDS-2446
> URL: https://issues.apache.org/jira/browse/HDDS-2446
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDetails stored in 
> ContainerReplica to DatanodeInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-11 Thread Nanda kumar (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972138#comment-16972138
 ] 

Nanda kumar commented on HDDS-2446:
---

I agree. Just a thought, what if we get the state of all available datanode at 
the start of {{ReplicationManager}} cycle? We can avoid multiple lookups for 
same datanode.

> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> 
>
> Key: HDDS-2446
> URL: https://issues.apache.org/jira/browse/HDDS-2446
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDetails stored in 
> ContainerReplica to DatanodeInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-11 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972133#comment-16972133
 ] 

Stephen O'Donnell commented on HDDS-2446:
-

I agree we need to be careful about how datanodeInfo is used with this change. 
The only place a datanodeInfo gets created right now is at DN registration. If 
a DN goes dead, all its containerReplica are removed from container manager, so 
on re-registration if a new datanodeDetails and datanodeInfo are created all 
new replicas will contain those objects. With the current code I think we are 
safe on this.

The performance of looking up datanodeInfo for each replica would not be 
terrible. It would come down to a few method calls and a map lookup per 
container replica, but I would like to avoid this if I can. We would need to do 
this lookup anytime a replica is selected for reading (to ensure the node is 
still IN_SERVICE) and anytime the ReplicationManager checks the containers for 
all containerReplicas in the cluster.

Once place we may need to be careful is when a maintenance node re-registers 
after maintenance, as we would need to retain those replicas in SCM even when 
the node goes dead I think.

> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> 
>
> Key: HDDS-2446
> URL: https://issues.apache.org/jira/browse/HDDS-2446
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDetails stored in 
> ContainerReplica to DatanodeInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-11 Thread Nanda kumar (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972110#comment-16972110
 ] 

Nanda kumar commented on HDDS-2446:
---

[~sodonnell], will the performance be bad if we query {{NodeManager}} every 
time to know the state of the datanode?

If we are storing {{DatanodeInfo}} reference inside {{ContainerReplica}}, we 
should make sure that no one re-maps (or creates new) {{DatanodeInfo}} in 
{{NodeManager}}. We don't have any such code now, but we might run into 
consistency issues later on if someone adds such logic.

> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> 
>
> Key: HDDS-2446
> URL: https://issues.apache.org/jira/browse/HDDS-2446
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDetails stored in 
> ContainerReplica to DatanodeInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-14617) Improve fsimage load time by writing sub-sections to the fsimage index

2019-11-11 Thread Feng Yuan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated HDFS-14617:
-
Comment: was deleted

(was: org.apache.hadoop.hdfs.server.namenode.FSDirectory#cacheName
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode.Loader#addToParent
Hi [~sodonnell] Don`t need concurrent? )

> Improve fsimage load time by writing sub-sections to the fsimage index
> --
>
> Key: HDFS-14617
> URL: https://issues.apache.org/jira/browse/HDFS-14617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 2.10.0, 3.3.0
>
> Attachments: HDFS-14617.001.patch, ParallelLoading.svg, 
> SerialLoading.svg, dirs-single.svg, flamegraph.parallel.svg, 
> flamegraph.serial.svg, inodes.svg
>
>
> Loading an fsimage is basically a single threaded process. The current 
> fsimage is written out in sections, eg iNode, iNode_Directory, Snapshots, 
> Snapshot_Diff etc. Then at the end of the file, an index is written that 
> contains the offset and length of each section. The image loader code uses 
> this index to initialize an input stream to read and process each section. It 
> is important that one section is fully loaded before another is started, as 
> the next section depends on the results of the previous one.
> What I would like to propose is the following:
> 1. When writing the image, we can optionally output sub_sections to the 
> index. That way, a given section would effectively be split into several 
> sections, eg:
> {code:java}
>inode_section offset 10 length 1000
>  inode_sub_section offset 10 length 500
>  inode_sub_section offset 510 length 500
>  
>inode_dir_section offset 1010 length 1000
>  inode_dir_sub_section offset 1010 length 500
>  inode_dir_sub_section offset 1010 length 500
> {code}
> Here you can see we still have the original section index, but then we also 
> have sub-section entries that cover the entire section. Then a processor can 
> either read the full section in serial, or read each sub-section in parallel.
> 2. In the Image Writer code, we should set a target number of sub-sections, 
> and then based on the total inodes in memory, it will create that many 
> sub-sections per major image section. I think the only sections worth doing 
> this for are inode, inode_reference, inode_dir and snapshot_diff. All others 
> tend to be fairly small in practice.
> 3. If there are under some threshold of inodes (eg 10M) then don't bother 
> with the sub-sections as a serial load only takes a few seconds at that scale.
> 4. The image loading code can then have a switch to enable 'parallel loading' 
> and a 'number of threads' where it uses the sub-sections, or if not enabled 
> falls back to the existing logic to read the entire section in serial.
> Working with a large image of 316M inodes and 35GB on disk, I have a proof of 
> concept of this change working, allowing just inode and inode_dir to be 
> loaded in parallel, but I believe inode_reference and snapshot_diff can be 
> make parallel with the same technique.
> Some benchmarks I have are as follows:
> {code:java}
> Threads   1 2 3 4 
> 
> inodes448   290   226   189 
> inode_dir 326   211   170   161 
> Total 927   651   535   488 (MD5 calculation about 100 seconds)
> {code}
> The above table shows the time in seconds to load the inode section and the 
> inode_directory section, and then the total load time of the image.
> With 4 threads using the above technique, we are able to better than half the 
> load time of the two sections. With the patch in HDFS-13694 it would take a 
> further 100 seconds off the run time, going from 927 seconds to 388, which is 
> a significant improvement. Adding more threads beyond 4 has diminishing 
> returns as there are some synchronized points in the loading code to protect 
> the in memory structures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14617) Improve fsimage load time by writing sub-sections to the fsimage index

2019-11-11 Thread Feng Yuan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972098#comment-16972098
 ] 

Feng Yuan commented on HDFS-14617:
--

org.apache.hadoop.hdfs.server.namenode.FSDirectory#cacheName
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode.Loader#addToParent
Hi [~sodonnell] Don`t need concurrent? 

> Improve fsimage load time by writing sub-sections to the fsimage index
> --
>
> Key: HDFS-14617
> URL: https://issues.apache.org/jira/browse/HDFS-14617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 2.10.0, 3.3.0
>
> Attachments: HDFS-14617.001.patch, ParallelLoading.svg, 
> SerialLoading.svg, dirs-single.svg, flamegraph.parallel.svg, 
> flamegraph.serial.svg, inodes.svg
>
>
> Loading an fsimage is basically a single threaded process. The current 
> fsimage is written out in sections, eg iNode, iNode_Directory, Snapshots, 
> Snapshot_Diff etc. Then at the end of the file, an index is written that 
> contains the offset and length of each section. The image loader code uses 
> this index to initialize an input stream to read and process each section. It 
> is important that one section is fully loaded before another is started, as 
> the next section depends on the results of the previous one.
> What I would like to propose is the following:
> 1. When writing the image, we can optionally output sub_sections to the 
> index. That way, a given section would effectively be split into several 
> sections, eg:
> {code:java}
>inode_section offset 10 length 1000
>  inode_sub_section offset 10 length 500
>  inode_sub_section offset 510 length 500
>  
>inode_dir_section offset 1010 length 1000
>  inode_dir_sub_section offset 1010 length 500
>  inode_dir_sub_section offset 1010 length 500
> {code}
> Here you can see we still have the original section index, but then we also 
> have sub-section entries that cover the entire section. Then a processor can 
> either read the full section in serial, or read each sub-section in parallel.
> 2. In the Image Writer code, we should set a target number of sub-sections, 
> and then based on the total inodes in memory, it will create that many 
> sub-sections per major image section. I think the only sections worth doing 
> this for are inode, inode_reference, inode_dir and snapshot_diff. All others 
> tend to be fairly small in practice.
> 3. If there are under some threshold of inodes (eg 10M) then don't bother 
> with the sub-sections as a serial load only takes a few seconds at that scale.
> 4. The image loading code can then have a switch to enable 'parallel loading' 
> and a 'number of threads' where it uses the sub-sections, or if not enabled 
> falls back to the existing logic to read the entire section in serial.
> Working with a large image of 316M inodes and 35GB on disk, I have a proof of 
> concept of this change working, allowing just inode and inode_dir to be 
> loaded in parallel, but I believe inode_reference and snapshot_diff can be 
> make parallel with the same technique.
> Some benchmarks I have are as follows:
> {code:java}
> Threads   1 2 3 4 
> 
> inodes448   290   226   189 
> inode_dir 326   211   170   161 
> Total 927   651   535   488 (MD5 calculation about 100 seconds)
> {code}
> The above table shows the time in seconds to load the inode section and the 
> inode_directory section, and then the total load time of the image.
> With 4 threads using the above technique, we are able to better than half the 
> load time of the two sections. With the patch in HDFS-13694 it would take a 
> further 100 seconds off the run time, going from 927 seconds to 388, which is 
> a significant improvement. Adding more threads beyond 4 has diminishing 
> returns as there are some synchronized points in the loading code to protect 
> the in memory structures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

2019-11-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972096#comment-16972096
 ] 

Hadoop QA commented on HDFS-13811:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
4s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-rbf generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m  
9s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf |
|  |  Useless object stored in variable updateMountTables of method 
org.apache.hadoop.hdfs.server.federation.router.RouterQuotaUpdateService.periodicInvoke()
  At RouterQuotaUpdateService.java:updateMountTables of method 
org.apache.hadoop.hdfs.server.federation.router.RouterQuotaUpdateService.periodicInvoke()
  At RouterQuotaUpdateService.java:[line 84] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-13811 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985581/HDFS-13811.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 10e7e6ae6c6f 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 30b93f9 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28291/artifact/

[jira] [Assigned] (HDDS-2403) Remove leftover reference to OUTPUT_FILE from shellcheck.sh

2019-11-11 Thread Sandeep Nemuri (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Nemuri reassigned HDDS-2403:


Assignee: Sandeep Nemuri

> Remove leftover reference to OUTPUT_FILE from shellcheck.sh
> ---
>
> Key: HDDS-2403
> URL: https://issues.apache.org/jira/browse/HDDS-2403
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Sandeep Nemuri
>Priority: Trivial
>  Labels: newbie
>
> {{shellcheck.sh}} gives the following error (but works fine otherwise):
> {noformat}
> $ hadoop-ozone/dev-support/checks/shellcheck.sh
> hadoop-ozone/dev-support/checks/shellcheck.sh: line 23: : No such file or 
> directory
> ...
> {noformat}
> This happens because {{OUTPUT_FILE}} variable is undefined:
> {code:title=https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-ozone/dev-support/checks/shellcheck.sh#L23}
> echo "" > "$OUTPUT_FILE"
> {code}
> The command can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

2019-11-11 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-13811:
---
Attachment: HDFS-13811.002.patch

> RBF: Race condition between router admin quota update and periodic quota 
> update service
> ---
>
> Key: HDFS-13811
> URL: https://issues.apache.org/jira/browse/HDFS-13811
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch, 
> HDFS-13811.001.patch, HDFS-13811.002.patch
>
>
> If we try to update quota of an existing mount entry and at the same time 
> periodic quota update service is running on the same mount entry, it is 
> leading the mount table to _inconsistent state._
> Here transactions are:
> A - Quota update service is fetching mount table entries.
> B - Quota update service is updating the mount table with current usage.
> A' - User is trying to update quota using admin cmd.
> and the transaction sequence is [ A A' B ]
> quota update service is updating the mount table with old quota value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

2019-11-11 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972065#comment-16972065
 ] 

Jinglun commented on HDFS-13811:


Rebase and upload v02.

> RBF: Race condition between router admin quota update and periodic quota 
> update service
> ---
>
> Key: HDFS-13811
> URL: https://issues.apache.org/jira/browse/HDFS-13811
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch, 
> HDFS-13811.001.patch
>
>
> If we try to update quota of an existing mount entry and at the same time 
> periodic quota update service is running on the same mount entry, it is 
> leading the mount table to _inconsistent state._
> Here transactions are:
> A - Quota update service is fetching mount table entries.
> B - Quota update service is updating the mount table with current usage.
> A' - User is trying to update quota using admin cmd.
> and the transaction sequence is [ A A' B ]
> quota update service is updating the mount table with old quota value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-11-11 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972036#comment-16972036
 ] 

Yiqun Lin commented on HDFS-14648:
--

Thanks [~leosun08] for updating the patch! It looks more readable now. Some 
deep review comments from me:

*ClientContext.java*
1.Can we use {{deadNodeDetectionEnabled}} to replace 
{{sharedDeadNodesEnabled}}? This name will keep consistent with 
DeadNodeDetector. Can you replace this in the whole patch, including in some 
method comments?

*DFSClient.java*
1. Why we add the deadnode from dfsInputstream again? Just to get the latest 
dead node that hasn't  been detected? DeadNodeDetector should already add the  
dfsInputstream dead node when the deadnode was detected.
{code}
  public ConcurrentHashMap getDeadNodes(
+  DFSInputStream dfsInputStream) {
+if (clientContext.isSharedDeadNodesEnabled()) {
+  ConcurrentHashMap deadNodes =
+  new ConcurrentHashMap();
+  if (dfsInputStream != null) {
+deadNodes.putAll(dfsInputStream.getLocalDeadNodes());
+  }
+
+  Set detectDeadNodes =
+  clientContext.getDeadNodeDetector().getDeadNodesToDetect();
+  for (DatanodeInfo detectDeadNode : detectDeadNodes) {
+deadNodes.put(detectDeadNode, detectDeadNode);
+  }
...
{code}
2. Can we redundant '{}'?
{code}
LOG.debug("DeadNode detection is not enabled or given block {} is null, " +
+  "skip to remove node {}.", locatedBlocks);
{code} 
to
{code}
LOG.debug("DeadNode detection is not enabled or given block {} is null, " +
+  "skip to remove node.", locatedBlocks);
{code} 
 
*DeadNodeDetector.java*
1. Can we comment the name as Client context name
{code}
+  /**
+   * Client context name.
+   */
+  private String name;
{code}
2.Can we use datanodeuuid get from DatanodeInfo as the key? We use the same 
type object here and looks confused. datanodeuuid is the identity of one DN 
node.
{code}
private final ConcurrentHashMap deadNodes;
{code}
3. I think it will be better to print out the detected dead node info here.
{code}
LOG.debug("Current detector state {}, the detected nodes: {}.",);
{code}
4. Two comments for this:
1) Update the method name to clearAndGetDetectedDeadNodes
2) The line newDeadNodes.retainAll(deadNodes.values()); should not be correct, 
it will let newDeadNodes be same with old deadnodes.
{code}
+  public synchronized Set getDeadNodesToDetect() {
+// remove the dead nodes who doesn't have any inputstream first
+Set newDeadNodes = new HashSet();
+for (HashSet datanodeInfos : dfsInputStreamNodes.values()) {
+  newDeadNodes.addAll(datanodeInfos);
+}
+
+newDeadNodes.retainAll(deadNodes.values());
+
+for (DatanodeInfo datanodeInfo : deadNodes.values()) {
+  if (!newDeadNodes.contains(datanodeInfo)) {
+deadNodes.remove(datanodeInfo);
+  }
+}
+return newDeadNodes;
+  }
{code}

*TestDFSClientDetectDeadNodes.java*
Comments for testDetectDeadNodeInBackground  and 
testDeadNodeMultipleDFSInputStream: 
1). Can we check the deadnode size again after closing the dfs input stream?
2). Can we use DeadDetector#getDeadNodesToDetect to check dead node as well?
{code}
FSDataInputStream in = fs.open(filePath);
+try {
+  try {
+in.read();
+  } catch (BlockMissingException e) {
+  }
+
+  DFSInputStream din = (DFSInputStream) in.getWrappedStream();
+  assertEquals(3, din.getDFSClient().getDeadNodes(din).size());
   // use DeadDetector to get dead node as well
+} finally {
+  in.close();
+  fs.delete(new Path("/testDetectDeadNodeInBackground"),
+  true);
  // check the dead node again here, the dead node is expected be removed
+}
{code}
3. Can we check the dead node detail info as well, like dn uuid?
{code}
DFSInputStream din2 = (DFSInputStream) in1.getWrappedStream();
+  assertEquals(1, din1.getDFSClient().getDeadNodes(din1).size());
+  assertEquals(1, din2.getDFSClient().getDeadNodes(din2).size());
//  check the dn uuid of dead node to see if its expected dead node
{code}


> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will opt

[jira] [Commented] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

2019-11-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972025#comment-16972025
 ] 

Hadoop QA commented on HDFS-13811:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HDFS-13811 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13811 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985579/HDFS-13811.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28290/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: Race condition between router admin quota update and periodic quota 
> update service
> ---
>
> Key: HDFS-13811
> URL: https://issues.apache.org/jira/browse/HDFS-13811
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch, 
> HDFS-13811.001.patch
>
>
> If we try to update quota of an existing mount entry and at the same time 
> periodic quota update service is running on the same mount entry, it is 
> leading the mount table to _inconsistent state._
> Here transactions are:
> A - Quota update service is fetching mount table entries.
> B - Quota update service is updating the mount table with current usage.
> A' - User is trying to update quota using admin cmd.
> and the transaction sequence is [ A A' B ]
> quota update service is updating the mount table with old quota value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

2019-11-11 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-13811:
---
Attachment: HDFS-13811.001.patch

> RBF: Race condition between router admin quota update and periodic quota 
> update service
> ---
>
> Key: HDFS-13811
> URL: https://issues.apache.org/jira/browse/HDFS-13811
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch, 
> HDFS-13811.001.patch
>
>
> If we try to update quota of an existing mount entry and at the same time 
> periodic quota update service is running on the same mount entry, it is 
> leading the mount table to _inconsistent state._
> Here transactions are:
> A - Quota update service is fetching mount table entries.
> B - Quota update service is updating the mount table with current usage.
> A' - User is trying to update quota using admin cmd.
> and the transaction sequence is [ A A' B ]
> quota update service is updating the mount table with old quota value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

2019-11-11 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972023#comment-16972023
 ] 

Jinglun commented on HDFS-13811:


Now we have quota & quota usage saved in 3 places: state store, mount table 
store and RouterQuotaManager. And the periodic service updates all the 3 places 
when the quota usage changes. The idea is only the router admin should be able 
to update the state store. Periodic service should only update the quota usage 
in local cache, which is the RouterQuotaManager. 

Based on the idea, only the quota part in the state store and the mount table 
store are meaningful(the usage part are meaningless). The quota usage would be 
saved and updated only by the RouterQuotaManager. In the periodic service we 
should update RouterQuotaManager's quota from mount table store and update 
RouterQuotaManager's quota usage by counting the real paths. The patch could be 
much simpler, upload v01.

> RBF: Race condition between router admin quota update and periodic quota 
> update service
> ---
>
> Key: HDFS-13811
> URL: https://issues.apache.org/jira/browse/HDFS-13811
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch
>
>
> If we try to update quota of an existing mount entry and at the same time 
> periodic quota update service is running on the same mount entry, it is 
> leading the mount table to _inconsistent state._
> Here transactions are:
> A - Quota update service is fetching mount table entries.
> B - Quota update service is updating the mount table with current usage.
> A' - User is trying to update quota using admin cmd.
> and the transaction sequence is [ A A' B ]
> quota update service is updating the mount table with old quota value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14976) HDFS:fsck option "-list-corruptfileblocks" suppress all other output while being used with different combination of fsck options

2019-11-11 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972000#comment-16972000
 ] 

Fei Hui commented on HDFS-14976:


[~SouryakantaDwivedy]
{quote}
But it does not mean that you can not use these options in combination
{quote}
I don't agree with you.
IMO "-files -blocks etc" is not for *-list-corruptfileblock*.So 
*-list-corruptfileblock* is the same as *-list-corruptfileblock -files 
-blocks*, it does not suppress output

> HDFS:fsck option "-list-corruptfileblocks" suppress all other output while 
> being used with different combination of fsck options
> 
>
> Key: HDFS-14976
> URL: https://issues.apache.org/jira/browse/HDFS-14976
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.2
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: fsck_log.PNG, image-2019-11-11-15-40-22-128.png
>
>
> fsck option "-list-corruptfileblocks" suppress all other output while being 
> used with different combination of fsck options.
> Steps:- 
> 
> Use hdfs fsck command with different combinations of options as 
>  
>  hdfs fsck / -files -blocks -locations -storagepolicies
>  hdfs fsck / -files -blocks -openforwrite
>  hdfs fsck / -files -blocks -showprogress
>  hdfs fsck / -files -openforwrite
> for all the combinations of options output will display.
>  
> Use same fsck options with "-list-corruptfileblocks" ,it will suppress the 
> output of 
>  all other options and only display the list of corrupt files which is not 
> correct behavior
>  Either it should display output of all the other option with corrupted file 
> info or it has 
>  to be specifed in help info that this option should use alone without any 
> combination of 
>  other options.Try these different combinations of options
>  
>  hdfs fsck / -files -blocks -list-corruptfileblocks
>  hdfs fsck / -list-corruptfileblocks -files -blocks
>  
> !image-2019-11-11-15-40-22-128.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14978) In-place Erasure Coding Conversion

2019-11-11 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971990#comment-16971990
 ] 

Fei Hui commented on HDFS-14978:


[~weichiu] Thanks for bringing this up, it makes senses.
Here i have a question
{quote}
this operation will abort if either file
is open (isUnderConstruction() == true)
{quote}
What is the client behavior during the CAS operation OP_SWAP_BLOCK_LIST

Maybe Miss delete(tmp) on example tool of Milestone 2

> In-place Erasure Coding Conversion
> --
>
> Key: HDFS-14978
> URL: https://issues.apache.org/jira/browse/HDFS-14978
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: In-place Erasure Coding Conversion.pdf
>
>
> HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses 
> encoding algorithms to reduce disk space usage while retaining redundancy 
> necessary for data recovery. It was a huge amount of work but it is just 
> getting adopted after almost 2 years.
> One usability problem that’s blocking users from adopting HDFS Erasure Coding 
> is that existing replicated files have to be copied to an EC-enabled 
> directory explicitly. Renaming a file/directory to an EC-enabled directory 
> does not automatically convert the blocks. Therefore users typically perform 
> the following steps to erasure-code existing files:
> {noformat}
> Create $tmp directory, set EC policy at it
> Distcp $src to $tmp
> Delete $src (rm -rf $src)
> mv $tmp $src
> {noformat}
> There are several reasons why this is not popular:
> * Complex. The process involves several steps: distcp data to a temporary 
> destination; delete source file; move destination to the source path.
> * Availability: there is a short period where nothing exists at the source 
> path, and jobs may fail unexpectedly.
> * Overhead. During the copy phase, there is a point in time where all of 
> source and destination files exist at the same time, exhausting disk space.
> * Not snapshot-friendly. If a snapshot is taken prior to performing the 
> conversion, the source (replicated) files will be preserved in the cluster 
> too. Therefore, the conversion actually increase storage space usage.
> * Not management-friendly. This approach changes file inode number, 
> modification time and access time. Erasure coded files are supposed to store 
> cold data, but this conversion makes data “hot” again.
> * Bulky. It’s either all or nothing. The directory may be partially erasure 
> coded, but this approach simply erasure code everything again.
> To ease data management, we should offer a utility tool to convert replicated 
> files to erasure coded files in-place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-11 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971969#comment-16971969
 ] 

Konstantin Shvachko commented on HDFS-14973:


Anyways, using {{RateLimiter}} sounds like a productive idea. That way we 
guarantee {{getBlocks}} do not exceed {{max-qps}} on the NameNode at any time. 
Some comments:
 # {{NameNodeConnector}} already gets {{Configuration}} as a parameter. Should 
we use the config to obtain {{max-qps}} inside the constructor, rather than 
dragging it through all calls?
 # {{TestBalancer}} is a heavy test since it recreates mini-cluster for each 
test case. At least for {{TestBalancerRPCDelay}} can we restructure code so 
that it would work against the same mini-cluster, without restarting it.
 ** Also would be good to run the test against the default parameter once: 20 
instead of 10?

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14973.000.patch, HDFS-14973.001.patch, 
> HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible

2019-11-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971966#comment-16971966
 ] 

Hadoop QA commented on HDFS-14979:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m 50s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14979 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985554/HDFS-14979.000.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 36c67513834f 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 30b93f9 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28289/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28289/testReport/ |
| Max. process+thread count | 2844 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28289/console |
| Powered by | Apache Yetus 0.8.0   http

[jira] [Updated] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible

2019-11-11 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14979:
---
Status: Patch Available  (was: Open)

Attached v000 patch. The fix is simply to add the {{@ReadOnly}} annotation. I 
also enhanced the test to confirm that {{getBlocks}} calls go to the observer.

> [Observer Node] Balancer should submit getBlocks to Observer Node when 
> possible
> ---
>
> Key: HDFS-14979
> URL: https://issues.apache.org/jira/browse/HDFS-14979
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, hdfs
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14979.000.patch
>
>
> In HDFS-14162, we made it so that the Balancer could function when 
> {{ObserverReadProxyProvider}} was in use. However, the Balancer would still 
> read from the active NameNode, because {{getBlocks}} wasn't annotated as 
> {{@ReadOnly}}. This task is to enable the Balancer to actually read from the 
> Observer Node to alleviate load from the active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible

2019-11-11 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971889#comment-16971889
 ] 

Erik Krogen edited comment on HDFS-14979 at 11/11/19 10:47 PM:
---

Attached v000 patch. The fix is simply to add the {{@ReadOnly}} annotation. I 
also enhanced the test to confirm that {{getBlocks}} calls go to the observer.

[~shv] or [~vagarychen], care to help review?


was (Author: xkrogen):
Attached v000 patch. The fix is simply to add the {{@ReadOnly}} annotation. I 
also enhanced the test to confirm that {{getBlocks}} calls go to the observer.

> [Observer Node] Balancer should submit getBlocks to Observer Node when 
> possible
> ---
>
> Key: HDFS-14979
> URL: https://issues.apache.org/jira/browse/HDFS-14979
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, hdfs
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14979.000.patch
>
>
> In HDFS-14162, we made it so that the Balancer could function when 
> {{ObserverReadProxyProvider}} was in use. However, the Balancer would still 
> read from the active NameNode, because {{getBlocks}} wasn't annotated as 
> {{@ReadOnly}}. This task is to enable the Balancer to actually read from the 
> Observer Node to alleviate load from the active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible

2019-11-11 Thread Erik Krogen (Jira)
Erik Krogen created HDFS-14979:
--

 Summary: [Observer Node] Balancer should submit getBlocks to 
Observer Node when possible
 Key: HDFS-14979
 URL: https://issues.apache.org/jira/browse/HDFS-14979
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer & mover, hdfs
Reporter: Erik Krogen
Assignee: Erik Krogen


In HDFS-14162, we made it so that the Balancer could function when 
{{ObserverReadProxyProvider}} was in use. However, the Balancer would still 
read from the active NameNode, because {{getBlocks}} wasn't annotated as 
{{@ReadOnly}}. This task is to enable the Balancer to actually read from the 
Observer Node to alleviate load from the active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible

2019-11-11 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14979:
---
Attachment: HDFS-14979.000.patch

> [Observer Node] Balancer should submit getBlocks to Observer Node when 
> possible
> ---
>
> Key: HDFS-14979
> URL: https://issues.apache.org/jira/browse/HDFS-14979
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, hdfs
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14979.000.patch
>
>
> In HDFS-14162, we made it so that the Balancer could function when 
> {{ObserverReadProxyProvider}} was in use. However, the Balancer would still 
> read from the active NameNode, because {{getBlocks}} wasn't annotated as 
> {{@ReadOnly}}. This task is to enable the Balancer to actually read from the 
> Observer Node to alleviate load from the active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971876#comment-16971876
 ] 

Hadoop QA commented on HDFS-14973:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 50s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 825 unchanged - 1 fixed = 826 total (was 826) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m 45s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14973 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985545/HDFS-14973.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 9c7ff84a7583 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 30b93f9 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28288/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Bui

[jira] [Commented] (HDDS-1367) Add ability in Recon to track the growth rate of the cluster.

2019-11-11 Thread Aravindan Vijayan (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971870#comment-16971870
 ] 

Aravindan Vijayan commented on HDDS-1367:
-

[~dineshchitlangia] Apologies for the delayed response. In this JIRA, we will 
be covering only the growth of the cluster from the usage point of view. We 
will have to track the read/write ops through metrics from OM, SCM and 
Datanode. 

> Add ability in Recon to track the growth rate of the cluster. 
> --
>
> Key: HDDS-1367
> URL: https://issues.apache.org/jira/browse/HDDS-1367
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>
> Recon should be able to answer the question "How fast is the cluster growing, 
> by week, by month, by day?", which gives the user an idea of the usage stats 
> of the cluster. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14974) RBF: Make tests use free ports

2019-11-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971853#comment-16971853
 ] 

Íñigo Goiri commented on HDFS-14974:


Thanks [~brahmareddy], I don't think we need to use the scanning that 
ServerSocketUtil does.
We can rely on :0 which will get us a free port.
All the tests are prepared to run in any port.
The results show all of them run as expected.

> RBF: Make tests use free ports
> --
>
> Key: HDFS-14974
> URL: https://issues.apache.org/jira/browse/HDFS-14974
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-14974.000.patch
>
>
> Currently, {{TestRouterSecurityManager#testCreateCredentials}} create a 
> Router with the default ports. However, these ports might be used. We should 
> set it to :0 for it to be assigned dynamically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2329) Destroy pipelines on any decommission or maintenance nodes

2019-11-11 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-2329:
--
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Destroy pipelines on any decommission or maintenance nodes
> --
>
> Key: HDDS-2329
> URL: https://issues.apache.org/jira/browse/HDDS-2329
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a node is marked for decommission or maintenance, the first step in 
> taking the node out of service is to destroy any pipelines the node is 
> involved in and confirm they have been destroyed before getting the container 
> list for the node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2329) Destroy pipelines on any decommission or maintenance nodes

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2329?focusedWorklogId=341454&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341454
 ]

ASF GitHub Bot logged work on HDDS-2329:


Author: ASF GitHub Bot
Created on: 11/Nov/19 19:49
Start Date: 11/Nov/19 19:49
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on pull request #86: HDDS-2329 
Destroy pipelines on any decommission or maintenance nodes
URL: https://github.com/apache/hadoop-ozone/pull/86
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341454)
Time Spent: 20m  (was: 10m)

> Destroy pipelines on any decommission or maintenance nodes
> --
>
> Key: HDDS-2329
> URL: https://issues.apache.org/jira/browse/HDDS-2329
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a node is marked for decommission or maintenance, the first step in 
> taking the node out of service is to destroy any pipelines the node is 
> involved in and confirm they have been destroyed before getting the container 
> list for the node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14878) DataStreamer's ResponseProceesor#run() should log with Warn loglevel

2019-11-11 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971817#comment-16971817
 ] 

Brahma Reddy Battula commented on HDFS-14878:
-

[~kihwal] let's know your opinon on this jira.

> DataStreamer's ResponseProceesor#run() should log with Warn loglevel
> 
>
> Key: HDFS-14878
> URL: https://issues.apache.org/jira/browse/HDFS-14878
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14878.001.patch
>
>
> {code:java}
>   if (duration > dfsclientSlowLogThresholdMs) {
> LOG.info("Slow ReadProcessor read fields for block " + block
>   + " took " + duration + "ms (threshold="
>   + dfsclientSlowLogThresholdMs + "ms); ack: " + ack
>   + ", targets: " + Arrays.asList(targets));
>   } {code}
> log level should be warn here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-11 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14973:
---
Attachment: HDFS-14973.001.patch

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14973.000.patch, HDFS-14973.001.patch, 
> HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-11 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971813#comment-16971813
 ] 

Erik Krogen commented on HDFS-14973:


Thanks for taking a look [~shv]!
{quote}The next wave of dispatcher threads after 100 should not hit the 
NameNode right away. It is supposed first to executePendingMove(), then call 
getBlocks(). And executePendingMove() naturally throttles the dispatcher, so it 
was not necessary to delay the subsequent ways.
{quote}
This might work, except that {{executePendingMove}} is non-blocking:
{code:java}
  public void executePendingMove(final PendingMove p) {
// move the reportedBlock
final DDatanode targetDn = p.target.getDDatanode();
ExecutorService moveExecutor = targetDn.getMoveExecutor();
if (moveExecutor == null) {
  final int nThreads = moverThreadAllocator.allocate();
  if (nThreads > 0) {
moveExecutor = targetDn.initMoveExecutor(nThreads);
  }
}
if (moveExecutor == null) {
  LOG.warn("No mover threads available: skip moving " + p);
  targetDn.removePendingBlock(p);
  p.proxySource.removePendingBlock(p);
  return;
}
moveExecutor.execute(new Runnable() {
  @Override
  public void run() {
p.dispatch();
  }
});
  }
{code}
It simply allocates a thread pool (if one does not exist), then submits a task 
to it to be executed. The actual movement will be executed later, by the 
{{moveExecutor}}. Even as far back as 2.6.1 (which doesn't have HDFS-11742), 
{{executePendingMove}} was similarly nonblocking:
{code:java}
  public void executePendingMove(final PendingMove p) {
// move the block
moveExecutor.execute(new Runnable() {
  @Override
  public void run() {
p.dispatch();
  }
});
  }
{code}
However I certainly agree that it's possible the changes to the balancer 
(HDFS-8818, HDFS-11742) exacerbated this issue.

With this non-blocking behavior, you end up with the scenario I described where 
the first 20 slots in the {{dispatchExecutor}} continue to push through 
dispatch tasks with throughput above the throttling limit.

Attached v1 patch addressing hdfs-site.xml issue.

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14973.000.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before 

[jira] [Commented] (HDFS-14978) In-place Erasure Coding Conversion

2019-11-11 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971812#comment-16971812
 ] 

Wei-Chiu Chuang commented on HDFS-14978:


Also, this is pretty much the same as HDFS-11347. However, there was little 
traction in the past few year, so I thought I should file a new one to start 
afresh.

> In-place Erasure Coding Conversion
> --
>
> Key: HDFS-14978
> URL: https://issues.apache.org/jira/browse/HDFS-14978
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: In-place Erasure Coding Conversion.pdf
>
>
> HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses 
> encoding algorithms to reduce disk space usage while retaining redundancy 
> necessary for data recovery. It was a huge amount of work but it is just 
> getting adopted after almost 2 years.
> One usability problem that’s blocking users from adopting HDFS Erasure Coding 
> is that existing replicated files have to be copied to an EC-enabled 
> directory explicitly. Renaming a file/directory to an EC-enabled directory 
> does not automatically convert the blocks. Therefore users typically perform 
> the following steps to erasure-code existing files:
> {noformat}
> Create $tmp directory, set EC policy at it
> Distcp $src to $tmp
> Delete $src (rm -rf $src)
> mv $tmp $src
> {noformat}
> There are several reasons why this is not popular:
> * Complex. The process involves several steps: distcp data to a temporary 
> destination; delete source file; move destination to the source path.
> * Availability: there is a short period where nothing exists at the source 
> path, and jobs may fail unexpectedly.
> * Overhead. During the copy phase, there is a point in time where all of 
> source and destination files exist at the same time, exhausting disk space.
> * Not snapshot-friendly. If a snapshot is taken prior to performing the 
> conversion, the source (replicated) files will be preserved in the cluster 
> too. Therefore, the conversion actually increase storage space usage.
> * Not management-friendly. This approach changes file inode number, 
> modification time and access time. Erasure coded files are supposed to store 
> cold data, but this conversion makes data “hot” again.
> * Bulky. It’s either all or nothing. The directory may be partially erasure 
> coded, but this approach simply erasure code everything again.
> To ease data management, we should offer a utility tool to convert replicated 
> files to erasure coded files in-place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14978) In-place Erasure Coding Conversion

2019-11-11 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971811#comment-16971811
 ] 

Wei-Chiu Chuang commented on HDFS-14978:


Attach the design doc. To encourage participation and comments, a live design 
doc is linked here. Please feel free to comment on that doc (or in this jira).

> In-place Erasure Coding Conversion
> --
>
> Key: HDFS-14978
> URL: https://issues.apache.org/jira/browse/HDFS-14978
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: In-place Erasure Coding Conversion.pdf
>
>
> HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses 
> encoding algorithms to reduce disk space usage while retaining redundancy 
> necessary for data recovery. It was a huge amount of work but it is just 
> getting adopted after almost 2 years.
> One usability problem that’s blocking users from adopting HDFS Erasure Coding 
> is that existing replicated files have to be copied to an EC-enabled 
> directory explicitly. Renaming a file/directory to an EC-enabled directory 
> does not automatically convert the blocks. Therefore users typically perform 
> the following steps to erasure-code existing files:
> {noformat}
> Create $tmp directory, set EC policy at it
> Distcp $src to $tmp
> Delete $src (rm -rf $src)
> mv $tmp $src
> {noformat}
> There are several reasons why this is not popular:
> * Complex. The process involves several steps: distcp data to a temporary 
> destination; delete source file; move destination to the source path.
> * Availability: there is a short period where nothing exists at the source 
> path, and jobs may fail unexpectedly.
> * Overhead. During the copy phase, there is a point in time where all of 
> source and destination files exist at the same time, exhausting disk space.
> * Not snapshot-friendly. If a snapshot is taken prior to performing the 
> conversion, the source (replicated) files will be preserved in the cluster 
> too. Therefore, the conversion actually increase storage space usage.
> * Not management-friendly. This approach changes file inode number, 
> modification time and access time. Erasure coded files are supposed to store 
> cold data, but this conversion makes data “hot” again.
> * Bulky. It’s either all or nothing. The directory may be partially erasure 
> coded, but this approach simply erasure code everything again.
> To ease data management, we should offer a utility tool to convert replicated 
> files to erasure coded files in-place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2418) Add the list trash command server side handling.

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2418:
-
Labels: pull-request-available  (was: )

> Add the list trash command server side handling.
> 
>
> Key: HDDS-2418
> URL: https://issues.apache.org/jira/browse/HDDS-2418
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Anu Engineer
>Assignee: Matthew Sharp
>Priority: Major
>  Labels: pull-request-available
>
> Add the standard code for any command handling in the server side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2418) Add the list trash command server side handling.

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2418?focusedWorklogId=341448&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341448
 ]

ASF GitHub Bot logged work on HDDS-2418:


Author: ASF GitHub Bot
Created on: 11/Nov/19 19:23
Start Date: 11/Nov/19 19:23
Worklog Time Spent: 10m 
  Work Description: mbsharp commented on pull request #143: HDDS-2418 Add 
the list trash command to the server side handling
URL: https://github.com/apache/hadoop-ozone/pull/143
 
 
   ## What changes were proposed in this pull request?
   
   This continues the list trash feature and adds server side handling for the 
new command.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2418
   
   ## How was this patch tested?
   
   New tests will be added on the next PR for the core logic.  
   
   mvn local build, rat, check style and find bugs
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341448)
Remaining Estimate: 0h
Time Spent: 10m

> Add the list trash command server side handling.
> 
>
> Key: HDDS-2418
> URL: https://issues.apache.org/jira/browse/HDDS-2418
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Anu Engineer
>Assignee: Matthew Sharp
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add the standard code for any command handling in the server side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14978) In-place Erasure Coding Conversion

2019-11-11 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14978:
---
Attachment: In-place Erasure Coding Conversion.pdf

> In-place Erasure Coding Conversion
> --
>
> Key: HDFS-14978
> URL: https://issues.apache.org/jira/browse/HDFS-14978
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: In-place Erasure Coding Conversion.pdf
>
>
> HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses 
> encoding algorithms to reduce disk space usage while retaining redundancy 
> necessary for data recovery. It was a huge amount of work but it is just 
> getting adopted after almost 2 years.
> One usability problem that’s blocking users from adopting HDFS Erasure Coding 
> is that existing replicated files have to be copied to an EC-enabled 
> directory explicitly. Renaming a file/directory to an EC-enabled directory 
> does not automatically convert the blocks. Therefore users typically perform 
> the following steps to erasure-code existing files:
> {noformat}
> Create $tmp directory, set EC policy at it
> Distcp $src to $tmp
> Delete $src (rm -rf $src)
> mv $tmp $src
> {noformat}
> There are several reasons why this is not popular:
> * Complex. The process involves several steps: distcp data to a temporary 
> destination; delete source file; move destination to the source path.
> * Availability: there is a short period where nothing exists at the source 
> path, and jobs may fail unexpectedly.
> * Overhead. During the copy phase, there is a point in time where all of 
> source and destination files exist at the same time, exhausting disk space.
> * Not snapshot-friendly. If a snapshot is taken prior to performing the 
> conversion, the source (replicated) files will be preserved in the cluster 
> too. Therefore, the conversion actually increase storage space usage.
> * Not management-friendly. This approach changes file inode number, 
> modification time and access time. Erasure coded files are supposed to store 
> cold data, but this conversion makes data “hot” again.
> * Bulky. It’s either all or nothing. The directory may be partially erasure 
> coded, but this approach simply erasure code everything again.
> To ease data management, we should offer a utility tool to convert replicated 
> files to erasure coded files in-place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14978) In-place Erasure Coding Conversion

2019-11-11 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-14978:
--

 Summary: In-place Erasure Coding Conversion
 Key: HDFS-14978
 URL: https://issues.apache.org/jira/browse/HDFS-14978
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: erasure-coding
Affects Versions: 3.0.0
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses 
encoding algorithms to reduce disk space usage while retaining redundancy 
necessary for data recovery. It was a huge amount of work but it is just 
getting adopted after almost 2 years.

One usability problem that’s blocking users from adopting HDFS Erasure Coding 
is that existing replicated files have to be copied to an EC-enabled directory 
explicitly. Renaming a file/directory to an EC-enabled directory does not 
automatically convert the blocks. Therefore users typically perform the 
following steps to erasure-code existing files:


{noformat}
Create $tmp directory, set EC policy at it
Distcp $src to $tmp
Delete $src (rm -rf $src)
mv $tmp $src
{noformat}


There are several reasons why this is not popular:
* Complex. The process involves several steps: distcp data to a temporary 
destination; delete source file; move destination to the source path.
* Availability: there is a short period where nothing exists at the source 
path, and jobs may fail unexpectedly.
* Overhead. During the copy phase, there is a point in time where all of source 
and destination files exist at the same time, exhausting disk space.
* Not snapshot-friendly. If a snapshot is taken prior to performing the 
conversion, the source (replicated) files will be preserved in the cluster too. 
Therefore, the conversion actually increase storage space usage.
* Not management-friendly. This approach changes file inode number, 
modification time and access time. Erasure coded files are supposed to store 
cold data, but this conversion makes data “hot” again.
* Bulky. It’s either all or nothing. The directory may be partially erasure 
coded, but this approach simply erasure code everything again.

To ease data management, we should offer a utility tool to convert replicated 
files to erasure coded files in-place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14974) RBF: Make tests use free ports

2019-11-11 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971800#comment-16971800
 ] 

Brahma Reddy Battula commented on HDFS-14974:
-

[~elgoiri]  have a check on org.apache.hadoop.net.ServerSocketUtil which might 
be used here.

> RBF: Make tests use free ports
> --
>
> Key: HDFS-14974
> URL: https://issues.apache.org/jira/browse/HDFS-14974
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-14974.000.patch
>
>
> Currently, {{TestRouterSecurityManager#testCreateCredentials}} create a 
> Router with the default ports. However, these ports might be used. We should 
> set it to :0 for it to be assigned dynamically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2325) BenchMarkDatanodeDispatcher genesis test is failing with NPE

2019-11-11 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek resolved HDDS-2325.
---
Fix Version/s: 0.5.0
   Resolution: Fixed

> BenchMarkDatanodeDispatcher genesis test is failing with NPE
> 
>
> Key: HDDS-2325
> URL: https://issues.apache.org/jira/browse/HDDS-2325
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ## What changes were proposed in this pull request?
> Genesis is a microbenchmark tool for Ozone based on JMH 
> ([https://openjdk.java.net/projects/code-tools/jmh/).]
>  
> Due to the recent Datanode changes the BenchMarkDatanodeDispatcher is failing 
> with NPE:
>  
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.ozone.container.common.interfaces.Handler.(Handler.java:69)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.(KeyValueHandler.java:114)
>   at 
> org.apache.hadoop.ozone.container.common.interfaces.Handler.getHandlerForContainerType(Handler.java:78)
>   at 
> org.apache.hadoop.ozone.genesis.BenchMarkDatanodeDispatcher.initialize(BenchMarkDatanodeDispatcher.java:115)
>   at 
> org.apache.hadoop.ozone.genesis.generated.BenchMarkDatanodeDispatcher_createContainer_jmhTest._jmh_tryInit_f_benchmarkdatanodedispatcher0_G(BenchMarkDatanodeDispatcher_createContainer_jmhTest.java:438)
>   at 
> org.apache.hadoop.ozone.genesis.generated.BenchMarkDatanodeDispatcher_createContainer_jmhTest.createContainer_Throughput(BenchMarkDatanodeDispatcher_createContainer_jmhTest.java:71)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
>   at 
> org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>  {code}
> And this is the just the biggest problem there are a few other problems. I 
> propose the following fixes:
> *fix 1*: NPE is thrown because the 'context' object is required by 
> KeyValueHandler/Handler classes.
> In fact the context is not required, we need two functionalities/info from 
> the context: the ability to send icr (IncrementalContainerReport) and the ID 
> of the datanode.
> Law of Demeter principle suggests to have only the minimum required 
> information from other classes.
> For example instead of having context but using only 
> context.getParent().getDatanodeDetails().getUuidString() we can have only the 
> UUID string which makes more easy to test (unit and benchmark) the 
> Handler/KeyValueHandler.
> This is the biggest (but still small change) in this patch: I started to use 
> the datanodeId and an icrSender instead of having the full context.
> *fix 2,3:* There were a few other problems. The scmId was missing if the 
> writeChunk was called from Benchmark and and the Checksum was also missing.
> *fix 4:* I also had a few other problems: very huge containers are used 
> (default 5G) and as the benchmark starts with creating 100 containers it 
> requires 500G space by default. I adjusted the container size to make it 
> possible to run on local machine.
>  
> ## How this patch can be tested?
> {code:java}
> ./ozone genesis -benchmark=BenchMarkDatanodeDispatcher.writeChunk{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2371) Print out the ozone version during the startup instead of hadoop version

2019-11-11 Thread YiSheng Lien (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YiSheng Lien reassigned HDDS-2371:
--

Assignee: (was: YiSheng Lien)

> Print out the ozone version during the startup instead of hadoop version
> 
>
> Key: HDDS-2371
> URL: https://issues.apache.org/jira/browse/HDDS-2371
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Priority: Major
>  Labels: newbie
>
> Ozone components printing out the current version during the startup:
>  
> {code:java}
> STARTUP_MSG: Starting StorageContainerManager
> STARTUP_MSG:   host = om/10.8.0.145
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 3.2.0
> STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r 
> e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on 
> 2019-01-{code}
> But as it's visible the build / compiled information is about hadoop not 
> about hadoop-ozone.
> (And personally I prefer to use a github compatible url instead of the SVN 
> style -r. Something like:
> {code:java}
> STARTUP_MSG: build =  
> https://github.com/apache/hadoop-ozone/commit/8541c5694efebb58f53cf4665d3e4e6e4a12845c
>  ; compiled by '' on ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2276) Allow users to pass hostnames or IP when decommissioning nodes

2019-11-11 Thread YiSheng Lien (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YiSheng Lien reassigned HDDS-2276:
--

Assignee: (was: YiSheng Lien)

> Allow users to pass hostnames or IP when decommissioning nodes
> --
>
> Key: HDDS-2276
> URL: https://issues.apache.org/jira/browse/HDDS-2276
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Priority: Major
>
> In the initial implementation, the user must pass a hostname or the IP when 
> decommissioning a host, depending on the setting:
> dfs.datanode.use.datanode.hostname
> It would be better if the user can pass either host or IP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2325) BenchMarkDatanodeDispatcher genesis test is failing with NPE

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2325?focusedWorklogId=341403&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341403
 ]

ASF GitHub Bot logged work on HDDS-2325:


Author: ASF GitHub Bot
Created on: 11/Nov/19 18:14
Start Date: 11/Nov/19 18:14
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #60: HDDS-2325. 
BenchMarkDatanodeDispatcher genesis test is failing with NPE
URL: https://github.com/apache/hadoop-ozone/pull/60
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341403)
Time Spent: 20m  (was: 10m)

> BenchMarkDatanodeDispatcher genesis test is failing with NPE
> 
>
> Key: HDDS-2325
> URL: https://issues.apache.org/jira/browse/HDDS-2325
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ## What changes were proposed in this pull request?
> Genesis is a microbenchmark tool for Ozone based on JMH 
> ([https://openjdk.java.net/projects/code-tools/jmh/).]
>  
> Due to the recent Datanode changes the BenchMarkDatanodeDispatcher is failing 
> with NPE:
>  
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.ozone.container.common.interfaces.Handler.(Handler.java:69)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.(KeyValueHandler.java:114)
>   at 
> org.apache.hadoop.ozone.container.common.interfaces.Handler.getHandlerForContainerType(Handler.java:78)
>   at 
> org.apache.hadoop.ozone.genesis.BenchMarkDatanodeDispatcher.initialize(BenchMarkDatanodeDispatcher.java:115)
>   at 
> org.apache.hadoop.ozone.genesis.generated.BenchMarkDatanodeDispatcher_createContainer_jmhTest._jmh_tryInit_f_benchmarkdatanodedispatcher0_G(BenchMarkDatanodeDispatcher_createContainer_jmhTest.java:438)
>   at 
> org.apache.hadoop.ozone.genesis.generated.BenchMarkDatanodeDispatcher_createContainer_jmhTest.createContainer_Throughput(BenchMarkDatanodeDispatcher_createContainer_jmhTest.java:71)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
>   at 
> org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>  {code}
> And this is the just the biggest problem there are a few other problems. I 
> propose the following fixes:
> *fix 1*: NPE is thrown because the 'context' object is required by 
> KeyValueHandler/Handler classes.
> In fact the context is not required, we need two functionalities/info from 
> the context: the ability to send icr (IncrementalContainerReport) and the ID 
> of the datanode.
> Law of Demeter principle suggests to have only the minimum required 
> information from other classes.
> For example instead of having context but using only 
> context.getParent().getDatanodeDetails().getUuidString() we can have only the 
> UUID string which makes more easy to test (unit and benchmark) the 
> Handler/KeyValueHandler.
> This is the biggest (but still small change) in this patch: I started to use 
> the datanodeId and an icrSender instead of having the full context.
> *fix 2,3:* There were a few other problems. The scmId was missing if the 
> writeChunk was called from Benchmark and and the Checksum was also missing.
> *fix 4:* I also had a few other problems: very huge containers are used 
> (default 5G) and as the benchmark starts with creating 100 containers it 
> requires 500G space by default. I adjusted the container size to make it 
> possible to run on local machine.
> 

[jira] [Created] (HDDS-2459) Refactor ReplicationManager to consider maintenance states

2019-11-11 Thread Stephen O'Donnell (Jira)
Stephen O'Donnell created HDDS-2459:
---

 Summary: Refactor ReplicationManager to consider maintenance states
 Key: HDDS-2459
 URL: https://issues.apache.org/jira/browse/HDDS-2459
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: SCM
Affects Versions: 0.5.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


In its current form the replication manager does not consider decommission or 
maintenance states when checking if replicas are sufficiently replicated. With 
the introduction of maintenance states, it needs to consider decommission and 
maintenance states when deciding if blocks are over or under replicated.

It also needs to provide an API to allow the decommission manager to check if 
blocks are over or under replicated, so the decommission manager can decide if 
a node has completed decommission and maintenance or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2448) Delete container command should used a thread pool

2019-11-11 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-2448:

Status: Patch Available  (was: Open)

> Delete container command should used a thread pool
> --
>
> Key: HDDS-2448
> URL: https://issues.apache.org/jira/browse/HDDS-2448
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The datanode receives commands over the heartbeat and queues all commands on 
> a single queue in StateContext.commandQueue. Inside DatanodeStateMachine a 
> single thread is used to process this queue (started by initCommandHander 
> thread) and it passes each command to a ‘handler’. Each command type has its 
> own handler.
> The delete container command immediately executes the command on the thread 
> used to process the command queue. Therefore if the delete is slow for some 
> reason (it must access disk, so this is possible) it could cause other 
> commands to backup.
> This should be changed to use a threadpool to queue the deleteContainer 
> command, in a similar way to ReplicateContainerCommand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

2019-11-11 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-2446:

Status: Patch Available  (was: Open)

> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> 
>
> Key: HDDS-2446
> URL: https://issues.apache.org/jira/browse/HDDS-2446
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDetails stored in 
> ContainerReplica to DatanodeInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14977) Quota Usage and Content summary are not same in Truncate with Snapshot

2019-11-11 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14977:
-
Description: 
steps : hdfs dfs -mkdir /dir

           hdfs dfs -put file /dir          (file size = 10bytes)

           hdfs dfsadmin -allowSnapshot /dir

           hdfs dfs -createSnapshot /dir s1 

space consumed with Quotausage and Content Summary is 30bytes

           hdfs dfs -truncate -w 5 /dir/file

space consumed with Quotausage , Content Summary is 45 bytes

           hdfs dfs -deleteSnapshot /dir s1

space consumed with Quotausage is 45bytes and Content Summary is 15bytes 

> Quota Usage and Content summary are not same in Truncate with Snapshot 
> ---
>
> Key: HDFS-14977
> URL: https://issues.apache.org/jira/browse/HDFS-14977
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
>
> steps : hdfs dfs -mkdir /dir
>            hdfs dfs -put file /dir          (file size = 10bytes)
>            hdfs dfsadmin -allowSnapshot /dir
>            hdfs dfs -createSnapshot /dir s1 
> space consumed with Quotausage and Content Summary is 30bytes
>            hdfs dfs -truncate -w 5 /dir/file
> space consumed with Quotausage , Content Summary is 45 bytes
>            hdfs dfs -deleteSnapshot /dir s1
> space consumed with Quotausage is 45bytes and Content Summary is 15bytes 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14977) Quota Usage and Content summary are not same in Truncate with Snapshot

2019-11-11 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-14977:


 Summary: Quota Usage and Content summary are not same in Truncate 
with Snapshot 
 Key: HDFS-14977
 URL: https://issues.apache.org/jira/browse/HDFS-14977
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14976) HDFS:fsck option "-list-corruptfileblocks" suppress all other output while being used with different combination of fsck options

2019-11-11 Thread Souryakanta Dwivedy (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971548#comment-16971548
 ] 

Souryakanta Dwivedy commented on HDFS-14976:


       May be you are right.Form the usage info as per the [] and | symbol 
representation some how it represents that -files -blocks -locations etc is not 
for -list-corruptfileblock.But it does not mean that you can not use these 
options in combination.My concern is none of the option should suppress the 
output of any other option.Even if you won't specify any option except the file 
or folder name as input for fsck "like: hdfs fsck /test" then also it will 
provide some common output as health status , common file block info,erasure 
code block info etc which should not be suppressed by any option.

> HDFS:fsck option "-list-corruptfileblocks" suppress all other output while 
> being used with different combination of fsck options
> 
>
> Key: HDFS-14976
> URL: https://issues.apache.org/jira/browse/HDFS-14976
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.2
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: fsck_log.PNG, image-2019-11-11-15-40-22-128.png
>
>
> fsck option "-list-corruptfileblocks" suppress all other output while being 
> used with different combination of fsck options.
> Steps:- 
> 
> Use hdfs fsck command with different combinations of options as 
>  
>  hdfs fsck / -files -blocks -locations -storagepolicies
>  hdfs fsck / -files -blocks -openforwrite
>  hdfs fsck / -files -blocks -showprogress
>  hdfs fsck / -files -openforwrite
> for all the combinations of options output will display.
>  
> Use same fsck options with "-list-corruptfileblocks" ,it will suppress the 
> output of 
>  all other options and only display the list of corrupt files which is not 
> correct behavior
>  Either it should display output of all the other option with corrupted file 
> info or it has 
>  to be specifed in help info that this option should use alone without any 
> combination of 
>  other options.Try these different combinations of options
>  
>  hdfs fsck / -files -blocks -list-corruptfileblocks
>  hdfs fsck / -list-corruptfileblocks -files -blocks
>  
> !image-2019-11-11-15-40-22-128.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2448) Delete container command should used a thread pool

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2448?focusedWorklogId=341288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341288
 ]

ASF GitHub Bot logged work on HDDS-2448:


Author: ASF GitHub Bot
Created on: 11/Nov/19 13:05
Start Date: 11/Nov/19 13:05
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on pull request #142: HDDS-2448 
Delete container command should used a thread pool
URL: https://github.com/apache/hadoop-ozone/pull/142
 
 
   …r than on the main commandDispatcher thread
   
   ## What changes were proposed in this pull request?
   
   The datanode receives commands over the heartbeat and queues all commands on 
a single queue in StateContext.commandQueue. Inside DatanodeStateMachine a 
single thread is used to process this queue (started by initCommandHander 
thread) and it passes each command to a ‘handler’. Each command type has its 
own handler.
   
   The delete container command immediately executes the command on the thread 
used to process the command queue. Therefore if the delete is slow for some 
reason (it must access disk, so this is possible) it could cause other commands 
to backup.
   
   This should be changed to use a threadpool to queue the deleteContainer 
command, in a similar way to ReplicateContainerCommand.
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2448
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341288)
Remaining Estimate: 0h
Time Spent: 10m

> Delete container command should used a thread pool
> --
>
> Key: HDDS-2448
> URL: https://issues.apache.org/jira/browse/HDDS-2448
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The datanode receives commands over the heartbeat and queues all commands on 
> a single queue in StateContext.commandQueue. Inside DatanodeStateMachine a 
> single thread is used to process this queue (started by initCommandHander 
> thread) and it passes each command to a ‘handler’. Each command type has its 
> own handler.
> The delete container command immediately executes the command on the thread 
> used to process the command queue. Therefore if the delete is slow for some 
> reason (it must access disk, so this is possible) it could cause other 
> commands to backup.
> This should be changed to use a threadpool to queue the deleteContainer 
> command, in a similar way to ReplicateContainerCommand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2448) Delete container command should used a thread pool

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2448:
-
Labels: pull-request-available  (was: )

> Delete container command should used a thread pool
> --
>
> Key: HDDS-2448
> URL: https://issues.apache.org/jira/browse/HDDS-2448
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>
> The datanode receives commands over the heartbeat and queues all commands on 
> a single queue in StateContext.commandQueue. Inside DatanodeStateMachine a 
> single thread is used to process this queue (started by initCommandHander 
> thread) and it passes each command to a ‘handler’. Each command type has its 
> own handler.
> The delete container command immediately executes the command on the thread 
> used to process the command queue. Therefore if the delete is slow for some 
> reason (it must access disk, so this is possible) it could cause other 
> commands to backup.
> This should be changed to use a threadpool to queue the deleteContainer 
> command, in a similar way to ReplicateContainerCommand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14976) HDFS:fsck option "-list-corruptfileblocks" suppress all other output while being used with different combination of fsck options

2019-11-11 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971475#comment-16971475
 ] 

Fei Hui commented on HDFS-14976:


[~SouryakantaDwivedy] Thanks for reporting
{quote}
hdfs fsck  [-list-corruptfileblocks | [-move | -delete | -openforwrite] 
[-files [-blocks [-locations | -racks | -replicaDetails | -upgradedomains 
[-includeSnapshots] [-showprogress] [-storagepolicies] [-maintenance] [-blockId 
]
{quote}
It means that *-list-corruptfileblocks*  or *[-move | -delete | -openforwrite] 
[-files [-blocks [-locations | -racks | -replicaDetails | -upgradedomains]]]* 
*-files -blocks -locations etc* is not for *-list-corruptfileblock*. Is it 
right?

> HDFS:fsck option "-list-corruptfileblocks" suppress all other output while 
> being used with different combination of fsck options
> 
>
> Key: HDFS-14976
> URL: https://issues.apache.org/jira/browse/HDFS-14976
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.2
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: fsck_log.PNG, image-2019-11-11-15-40-22-128.png
>
>
> fsck option "-list-corruptfileblocks" suppress all other output while being 
> used with different combination of fsck options.
> Steps:- 
> 
> Use hdfs fsck command with different combinations of options as 
>  
>  hdfs fsck / -files -blocks -locations -storagepolicies
>  hdfs fsck / -files -blocks -openforwrite
>  hdfs fsck / -files -blocks -showprogress
>  hdfs fsck / -files -openforwrite
> for all the combinations of options output will display.
>  
> Use same fsck options with "-list-corruptfileblocks" ,it will suppress the 
> output of 
>  all other options and only display the list of corrupt files which is not 
> correct behavior
>  Either it should display output of all the other option with corrupted file 
> info or it has 
>  to be specifed in help info that this option should use alone without any 
> combination of 
>  other options.Try these different combinations of options
>  
>  hdfs fsck / -files -blocks -list-corruptfileblocks
>  hdfs fsck / -list-corruptfileblocks -files -blocks
>  
> !image-2019-11-11-15-40-22-128.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14976) HDFS:fsck option "-list-corruptfileblocks" suppress all other output while being used with different combination of fsck options

2019-11-11 Thread Souryakanta Dwivedy (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971443#comment-16971443
 ] 

Souryakanta Dwivedy edited comment on HDFS-14976 at 11/11/19 10:56 AM:
---

install/hadoop/namenode/bin # ./hdfs fsck

Usage: hdfs fsck  [-list-corruptfileblocks | [-move | -delete | 
-openforwrite] [-files [-blocks [-locations | -racks | -replicaDetails | 
-upgradedomains [-includeSnapshots] [-showprogress] [-storagepolicies] 
[-maintenance] [-blockId ]

Pipe | symbol doesn't represent that it will be used alone or it will suppress 
the output of other options. pipe symbol also used for [-files [-blocks 
[-locations | -racks | -replicaDetails | -upgradedomains]] but these  options 
can also be used with other options as [-storagepolicies] , [-showprogress] 
etc.These options are not suppressing  the output of any other option


was (Author: souryakantadwivedy):
install/hadoop/namenode/bin # ./hdfs fsck

Usage: hdfs fsck  [-list-corruptfileblocks | [-move | -delete | 
-openforwrite] [-files [-blocks [-locations | -racks | -replicaDetails | 
-upgradedomains [-includeSnapshots] [-showprogress] [-storagepolicies] 
[-maintenance] [-blockId ]

Pipe doesn't represent that it will be used alone or it will suppress the 
output of other options. pipe also used for [-files [-blocks [-locations | 
-racks | -replicaDetails | -upgradedomains]] but these  options can also be 
used with other options as [-storagepolicies] , [-showprogress] etc.These 
options are not suppressing  the output of any other option

> HDFS:fsck option "-list-corruptfileblocks" suppress all other output while 
> being used with different combination of fsck options
> 
>
> Key: HDFS-14976
> URL: https://issues.apache.org/jira/browse/HDFS-14976
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.2
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: fsck_log.PNG, image-2019-11-11-15-40-22-128.png
>
>
> fsck option "-list-corruptfileblocks" suppress all other output while being 
> used with different combination of fsck options.
> Steps:- 
> 
> Use hdfs fsck command with different combinations of options as 
>  
>  hdfs fsck / -files -blocks -locations -storagepolicies
>  hdfs fsck / -files -blocks -openforwrite
>  hdfs fsck / -files -blocks -showprogress
>  hdfs fsck / -files -openforwrite
> for all the combinations of options output will display.
>  
> Use same fsck options with "-list-corruptfileblocks" ,it will suppress the 
> output of 
>  all other options and only display the list of corrupt files which is not 
> correct behavior
>  Either it should display output of all the other option with corrupted file 
> info or it has 
>  to be specifed in help info that this option should use alone without any 
> combination of 
>  other options.Try these different combinations of options
>  
>  hdfs fsck / -files -blocks -list-corruptfileblocks
>  hdfs fsck / -list-corruptfileblocks -files -blocks
>  
> !image-2019-11-11-15-40-22-128.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14976) HDFS:fsck option "-list-corruptfileblocks" suppress all other output while being used with different combination of fsck options

2019-11-11 Thread Souryakanta Dwivedy (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971443#comment-16971443
 ] 

Souryakanta Dwivedy commented on HDFS-14976:


install/hadoop/namenode/bin # ./hdfs fsck

Usage: hdfs fsck  [-list-corruptfileblocks | [-move | -delete | 
-openforwrite] [-files [-blocks [-locations | -racks | -replicaDetails | 
-upgradedomains [-includeSnapshots] [-showprogress] [-storagepolicies] 
[-maintenance] [-blockId ]

Pipe doesn't represent that it will be used alone or it will suppress the 
output of other options. pipe also used for [-files [-blocks [-locations | 
-racks | -replicaDetails | -upgradedomains]] but these  options can also be 
used with other options as [-storagepolicies] , [-showprogress] etc.These 
options are not suppressing  the output of any other option

> HDFS:fsck option "-list-corruptfileblocks" suppress all other output while 
> being used with different combination of fsck options
> 
>
> Key: HDFS-14976
> URL: https://issues.apache.org/jira/browse/HDFS-14976
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.2
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: fsck_log.PNG, image-2019-11-11-15-40-22-128.png
>
>
> fsck option "-list-corruptfileblocks" suppress all other output while being 
> used with different combination of fsck options.
> Steps:- 
> 
> Use hdfs fsck command with different combinations of options as 
>  
>  hdfs fsck / -files -blocks -locations -storagepolicies
>  hdfs fsck / -files -blocks -openforwrite
>  hdfs fsck / -files -blocks -showprogress
>  hdfs fsck / -files -openforwrite
> for all the combinations of options output will display.
>  
> Use same fsck options with "-list-corruptfileblocks" ,it will suppress the 
> output of 
>  all other options and only display the list of corrupt files which is not 
> correct behavior
>  Either it should display output of all the other option with corrupted file 
> info or it has 
>  to be specifed in help info that this option should use alone without any 
> combination of 
>  other options.Try these different combinations of options
>  
>  hdfs fsck / -files -blocks -list-corruptfileblocks
>  hdfs fsck / -list-corruptfileblocks -files -blocks
>  
> !image-2019-11-11-15-40-22-128.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14976) HDFS:fsck option "-list-corruptfileblocks" suppress all other output while being used with different combination of fsck options

2019-11-11 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971435#comment-16971435
 ] 

Ayush Saxena commented on HDFS-14976:
-

That is meant to be like that. Check the document there is a or symbol |  after 
list-corruptfileblocks

> HDFS:fsck option "-list-corruptfileblocks" suppress all other output while 
> being used with different combination of fsck options
> 
>
> Key: HDFS-14976
> URL: https://issues.apache.org/jira/browse/HDFS-14976
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.2
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: fsck_log.PNG, image-2019-11-11-15-40-22-128.png
>
>
> fsck option "-list-corruptfileblocks" suppress all other output while being 
> used with different combination of fsck options.
> Steps:- 
> 
> Use hdfs fsck command with different combinations of options as 
>  
>  hdfs fsck / -files -blocks -locations -storagepolicies
>  hdfs fsck / -files -blocks -openforwrite
>  hdfs fsck / -files -blocks -showprogress
>  hdfs fsck / -files -openforwrite
> for all the combinations of options output will display.
>  
> Use same fsck options with "-list-corruptfileblocks" ,it will suppress the 
> output of 
>  all other options and only display the list of corrupt files which is not 
> correct behavior
>  Either it should display output of all the other option with corrupted file 
> info or it has 
>  to be specifed in help info that this option should use alone without any 
> combination of 
>  other options.Try these different combinations of options
>  
>  hdfs fsck / -files -blocks -list-corruptfileblocks
>  hdfs fsck / -list-corruptfileblocks -files -blocks
>  
> !image-2019-11-11-15-40-22-128.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14976) HDFS:fsck option "-list-corruptfileblocks" suppress all other output while being used with different combination of fsck options

2019-11-11 Thread Souryakanta Dwivedy (Jira)
Souryakanta Dwivedy created HDFS-14976:
--

 Summary: HDFS:fsck option "-list-corruptfileblocks" suppress all 
other output while being used with different combination of fsck options
 Key: HDFS-14976
 URL: https://issues.apache.org/jira/browse/HDFS-14976
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.1.2
Reporter: Souryakanta Dwivedy
 Attachments: fsck_log.PNG, image-2019-11-11-15-40-22-128.png

fsck option "-list-corruptfileblocks" suppress all other output while being 
used with different combination of fsck options.

Steps:- 



Use hdfs fsck command with different combinations of options as 
 
 hdfs fsck / -files -blocks -locations -storagepolicies
 hdfs fsck / -files -blocks -openforwrite
 hdfs fsck / -files -blocks -showprogress
 hdfs fsck / -files -openforwrite

for all the combinations of options output will display.

 

Use same fsck options with "-list-corruptfileblocks" ,it will suppress the 
output of 
 all other options and only display the list of corrupt files which is not 
correct behavior
 Either it should display output of all the other option with corrupted file 
info or it has 
 to be specifed in help info that this option should use alone without any 
combination of 
 other options.Try these different combinations of options
 
 hdfs fsck / -files -blocks -list-corruptfileblocks
 hdfs fsck / -list-corruptfileblocks -files -blocks

 

!image-2019-11-11-15-40-22-128.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2452) Wrong condition for re-scheduling in ReportPublisher

2019-11-11 Thread Nanda kumar (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971412#comment-16971412
 ] 

Nanda kumar commented on HDDS-2452:
---

I feel we don't even need to check {{!executor.isShutdown()}}, as this piece of 
code is run inside the executor it will never get called if the executor is 
shut down.

> Wrong condition for re-scheduling in ReportPublisher
> 
>
> Key: HDDS-2452
> URL: https://issues.apache.org/jira/browse/HDDS-2452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Priority: Trivial
>  Labels: newbie
>
> It seems the condition for scheduling next run of {{ReportPublisher}} is 
> wrong:
> {code:title=https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/report/ReportPublisher.java#L74-L76}
> if (!executor.isShutdown() ||
> !(context.getState() == DatanodeStates.SHUTDOWN)) {
>   executor.schedule(this,
> {code}
> Given the condition above, the task may be scheduled again if the executor is 
> shutdown, but the state machine is not set to shutdown (or vice versa).  I 
> think the condition should have an {{&&}}, not {{||}}.  (Currently it is 
> unlikely to happen, since [context state is set to shutdown before the report 
> executor|https://github.com/apache/hadoop-ozone/blob/f928a0bdb4ea2e5195da39256c6dda9f1c855649/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java#L392-L393].)
> [~nanda], can you please confirm if this is a typo or intentional?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2452) Wrong condition for re-scheduling in ReportPublisher

2019-11-11 Thread Nanda kumar (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971410#comment-16971410
 ] 

Nanda kumar commented on HDDS-2452:
---

[~adoroszlai], yes you're right. It's a bug.
I don't know what I was thinking while writing it :)
Thanks for catching it.

> Wrong condition for re-scheduling in ReportPublisher
> 
>
> Key: HDDS-2452
> URL: https://issues.apache.org/jira/browse/HDDS-2452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Priority: Trivial
>  Labels: newbie
>
> It seems the condition for scheduling next run of {{ReportPublisher}} is 
> wrong:
> {code:title=https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/report/ReportPublisher.java#L74-L76}
> if (!executor.isShutdown() ||
> !(context.getState() == DatanodeStates.SHUTDOWN)) {
>   executor.schedule(this,
> {code}
> Given the condition above, the task may be scheduled again if the executor is 
> shutdown, but the state machine is not set to shutdown (or vice versa).  I 
> think the condition should have an {{&&}}, not {{||}}.  (Currently it is 
> unlikely to happen, since [context state is set to shutdown before the report 
> executor|https://github.com/apache/hadoop-ozone/blob/f928a0bdb4ea2e5195da39256c6dda9f1c855649/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java#L392-L393].)
> [~nanda], can you please confirm if this is a typo or intentional?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2458) Avoid list copy in ChecksumData

2019-11-11 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2458:
---
Status: Patch Available  (was: Open)

> Avoid list copy in ChecksumData
> ---
>
> Key: HDDS-2458
> URL: https://issues.apache.org/jira/browse/HDDS-2458
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ChecksumData}} is initially created with empty list of checksums, then it 
> is updated with computed checksums, copying the list.  The computed list can 
> be set directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2458) Avoid list copy in ChecksumData

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2458:
-
Labels: pull-request-available  (was: )

> Avoid list copy in ChecksumData
> ---
>
> Key: HDDS-2458
> URL: https://issues.apache.org/jira/browse/HDDS-2458
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>
> {{ChecksumData}} is initially created with empty list of checksums, then it 
> is updated with computed checksums, copying the list.  The computed list can 
> be set directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2458) Avoid list copy in ChecksumData

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2458?focusedWorklogId=341188&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341188
 ]

ASF GitHub Bot logged work on HDDS-2458:


Author: ASF GitHub Bot
Created on: 11/Nov/19 07:59
Start Date: 11/Nov/19 07:59
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #141: HDDS-2458. 
Avoid list copy in ChecksumData
URL: https://github.com/apache/hadoop-ozone/pull/141
 
 
   ## What changes were proposed in this pull request?
   
   Create `ChecksumData` with checksum list, instead of updating it right after 
creation, to avoid unnecessarily copying the list.
   
   https://issues.apache.org/jira/browse/HDDS-2458
   
   ## How was this patch tested?
   
   Tested on 3-node cluster with a 120MB key.  `ozone sh key get` verifies the 
checksum.
   
   ```
   ozone freon ockg -p test -t 1 -n 1 # easy volume and bucket creation ;)
   cat share/ozone/lib/*jar > tmp.jar
   ozone sh key put vol1/bucket1/test/large tmp.jar
   ozone sh key get vol1/bucket1/test/large tmp.jar.out
   diff -q tmp.jar tmp.jar.out
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341188)
Remaining Estimate: 0h
Time Spent: 10m

> Avoid list copy in ChecksumData
> ---
>
> Key: HDDS-2458
> URL: https://issues.apache.org/jira/browse/HDDS-2458
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ChecksumData}} is initially created with empty list of checksums, then it 
> is updated with computed checksums, copying the list.  The computed list can 
> be set directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org