from:"Wangda Tan \(JIRA\)"

[jira] [Commented] (HDFS-14084) Need for more stats in DFSClient

2019-01-09 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738856#comment-16738856
 ] 

Wangda Tan commented on HDFS-14084:
---

Thanks [~jlowe] for letting me know, will redo the RC. 

> Need for more stats in DFSClient
> 
>
> Key: HDFS-14084
> URL: https://issues.apache.org/jira/browse/HDFS-14084
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Assignee: Pranay Singh
>Priority: Minor
> Attachments: HDFS-14084.001.patch, HDFS-14084.002.patch, 
> HDFS-14084.003.patch, HDFS-14084.004.patch, HDFS-14084.005.patch, 
> HDFS-14084.006.patch, HDFS-14084.007.patch, HDFS-14084.008.patch, 
> HDFS-14084.009.patch, HDFS-14084.010.patch, HDFS-14084.011.patch
>
>
> The usage of HDFS has changed from being used as a map-reduce filesystem, now 
> it's becoming more of like a general purpose filesystem. In most of the cases 
> there are issues with the Namenode so we have metrics to know the workload or 
> stress on Namenode.
> However, there is a need to have more statistics collected for different 
> operations/RPCs in DFSClient to know which RPC operations are taking longer 
> time or to know what is the frequency of the operation.These statistics can 
> be exposed to the users of DFS Client and they can periodically log or do 
> some sort of flow control if the response is slow. This will also help to 
> isolate HDFS issue in a mixed environment where on a node say we have Spark, 
> HBase and Impala running together. We can check the throughput of different 
> operation across client and isolate the problem caused because of noisy 
> neighbor or network congestion or shared JVM.
> We have dealt with several problems from the field for which there is no 
> conclusive evidence as to what caused the problem. If we had metrics or stats 
> in DFSClient we would be better equipped to solve such complex problems.
> List of jiras for reference:
> -
>  HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13661) Ls command with e option fails when the filesystem is not HDFS

2018-11-16 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13661:
--
Target Version/s: 3.2.0, 3.0.4, 3.1.3  (was: 3.2.0, 3.0.4, 3.1.2)

> Ls command with e option fails when the filesystem is not HDFS
> --
>
> Key: HDFS-13661
> URL: https://issues.apache.org/jira/browse/HDFS-13661
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding, tools
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-13661.1.patch
>
>
> {noformat}
> $ hadoop fs -ls -e file://
> Found 10 items
> -ls: Fatal internal error
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.shell.Ls.adjustColumnWidths(Ls.java:308)
>   at org.apache.hadoop.fs.shell.Ls.processPaths(Ls.java:242)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:387)
>   at org.apache.hadoop.fs.shell.Ls.processPathArgument(Ls.java:226)
>   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285)
>   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269)
>   at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:120)
>   at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
>   at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-16 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13998:
--
Target Version/s: 3.2.0, 3.1.3  (was: 3.2.0, 3.1.2)

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, 
> HDFS-13998.03.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13541) NameNode Port based selective encryption

2018-11-16 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13541:
--
Target Version/s: 3.1.3  (was: 3.1.2)

> NameNode Port based selective encryption
> 
>
> Key: HDFS-13541
> URL: https://issues.apache.org/jira/browse/HDFS-13541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, security
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: NameNode Port based selective encryption-v1.pdf
>
>
> Here at LinkedIn, one issue we face is that we need to enforce different 
> security requirement based on the location of client and the cluster. 
> Specifically, for clients from outside of the data center, it is required by 
> regulation that all traffic must be encrypted. But for clients within the 
> same data center, unencrypted connections are more desired to avoid the high 
> encryption overhead. 
> HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335 
> introduced WhitelistBasedResolver which solves the same problem. However we 
> found it difficult to fit into our environment for several reasons. In this 
> JIRA, on top of pluggable SASL resolver, *we propose a different approach of 
> running RPC two ports on NameNode, and the two ports will be enforcing 
> encrypted and unencrypted connections respectively, and the following 
> DataNode access will simply follow the same behaviour of 
> encryption/unencryption*. Then by blocking unencrypted port on datacenter 
> firewall, we can completely block unencrypted external access.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13566) Add configurable additional RPC listener to NameNode

2018-07-31 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13566:
--
Target Version/s: 3.1.2  (was: 3.1.1)

> Add configurable additional RPC listener to NameNode
> 
>
> Key: HDFS-13566
> URL: https://issues.apache.org/jira/browse/HDFS-13566
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13566.001.patch, HDFS-13566.002.patch, 
> HDFS-13566.003.patch
>
>
> This Jira aims to add the capability to NameNode to run additional 
> listener(s). Such that NameNode can be accessed from multiple ports. 
> Fundamentally, this Jira tries to extend ipc.Server to allow configured with 
> more listeners, binding to different ports, but sharing the same call queue 
> and the handlers. Useful when different clients are only allowed to access 
> certain different ports. Combined with HDFS-13547, this also allows different 
> ports to have different SASL security levels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13661) Ls command with e option fails when the filesystem is not HDFS

2018-07-31 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13661:
--
Target Version/s: 3.2.0, 3.0.4, 3.1.2  (was: 3.2.0, 3.1.1, 3.0.4)

> Ls command with e option fails when the filesystem is not HDFS
> --
>
> Key: HDFS-13661
> URL: https://issues.apache.org/jira/browse/HDFS-13661
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding, tools
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-13661.1.patch
>
>
> {noformat}
> $ hadoop fs -ls -e file://
> Found 10 items
> -ls: Fatal internal error
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.shell.Ls.adjustColumnWidths(Ls.java:308)
>   at org.apache.hadoop.fs.shell.Ls.processPaths(Ls.java:242)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:387)
>   at org.apache.hadoop.fs.shell.Ls.processPathArgument(Ls.java:226)
>   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285)
>   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269)
>   at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:120)
>   at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
>   at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13541) NameNode Port based selective encryption

2018-07-31 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13541:
--
Target Version/s: 3.1.2  (was: 3.1.1)

> NameNode Port based selective encryption
> 
>
> Key: HDFS-13541
> URL: https://issues.apache.org/jira/browse/HDFS-13541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, security
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: NameNode Port based selective encryption-v1.pdf
>
>
> Here at LinkedIn, one issue we face is that we need to enforce different 
> security requirement based on the location of client and the cluster. 
> Specifically, for clients from outside of the data center, it is required by 
> regulation that all traffic must be encrypted. But for clients within the 
> same data center, unencrypted connections are more desired to avoid the high 
> encryption overhead. 
> HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335 
> introduced WhitelistBasedResolver which solves the same problem. However we 
> found it difficult to fit into our environment for several reasons. In this 
> JIRA, on top of pluggable SASL resolver, *we propose a different approach of 
> running RPC two ports on NameNode, and the two ports will be enforcing 
> encrypted and unencrypted connections respectively, and the following 
> DataNode access will simply follow the same behaviour of 
> encryption/unencryption*. Then by blocking unencrypted port on datacenter 
> firewall, we can completely block unencrypted external access.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12716) 'dfs.datanode.failed.volumes.tolerated' to support minimum number of volumes to be available

2018-07-30 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562346#comment-16562346
 ] 

Wangda Tan commented on HDFS-12716:
---

Updated fixed version to 3.1.2 given this don't exist in branch-3.1.1

>  'dfs.datanode.failed.volumes.tolerated' to support minimum number of volumes 
> to be available
> -
>
> Key: HDFS-12716
> URL: https://issues.apache.org/jira/browse/HDFS-12716
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: usharani
>Assignee: Ranith Sardar
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: HDFS-12716.002.patch, HDFS-12716.003.patch, 
> HDFS-12716.004.patch, HDFS-12716.005.patch, HDFS-12716.006.patch, 
> HDFS-12716.patch
>
>
>   Currently 'dfs.datanode.failed.volumes.tolerated' supports number of 
> tolerated failed volumes to be mentioned. This configuration change requires 
> restart of datanode. Since datanode volumes can be changed dynamically, 
> keeping this configuration same for all may not be good idea.
> Support 'dfs.datanode.failed.volumes.tolerated' to accept special 
> 'negative value 'x' to tolerate failures of upto "n-x"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12716) 'dfs.datanode.failed.volumes.tolerated' to support minimum number of volumes to be available

2018-07-30 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12716:
--
Fix Version/s: (was: 3.1.1)
   3.1.2

>  'dfs.datanode.failed.volumes.tolerated' to support minimum number of volumes 
> to be available
> -
>
> Key: HDFS-12716
> URL: https://issues.apache.org/jira/browse/HDFS-12716
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: usharani
>Assignee: Ranith Sardar
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: HDFS-12716.002.patch, HDFS-12716.003.patch, 
> HDFS-12716.004.patch, HDFS-12716.005.patch, HDFS-12716.006.patch, 
> HDFS-12716.patch
>
>
>   Currently 'dfs.datanode.failed.volumes.tolerated' supports number of 
> tolerated failed volumes to be mentioned. This configuration change requires 
> restart of datanode. Since datanode volumes can be changed dynamically, 
> keeping this configuration same for all may not be good idea.
> Support 'dfs.datanode.failed.volumes.tolerated' to accept special 
> 'negative value 'x' to tolerate failures of upto "n-x"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11060) make DEFAULT_MAX_CORRUPT_FILEBLOCKS_RETURNED configurable

2018-07-30 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-11060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562314#comment-16562314
 ] 

Wangda Tan commented on HDFS-11060:
---

Updated fixed version to 3.1.2 given this don't exist in branch-3.1.1

> make DEFAULT_MAX_CORRUPT_FILEBLOCKS_RETURNED configurable
> -
>
> Key: HDFS-11060
> URL: https://issues.apache.org/jira/browse/HDFS-11060
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha3
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2
>
> Attachments: HDFS-11060.1.patch, HDFS-11060.2.patch
>
>
> Current, the easiest way to determine which blocks is missing is using NN web 
> UI or JMX. Unfortunately, because the 
> DEFAULT_MAX_CORRUPT_FILEBLOCKS_RETURNED=100 is hard code in FSNamesystem, 
> only 100 missing blocks can be return by UI and JMX. Even the result of URL 
> "https://nn:50070/fsck?listcorruptfileblocks=1&path=%2F"; is limited by this 
> hard code value too.
> I did know FSCK can return more than 100 result but due to the security 
> reason(with kerberos), it is very hard to involve to costumer programs and 
> scripts.
> So I think it should add a configurable var "maxCorruptFileBlocksReturned" to 
> fix above case.
> If community also think it's worth to do, I will patch this. If not, please 
> feel free to tell me the reason. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11060) make DEFAULT_MAX_CORRUPT_FILEBLOCKS_RETURNED configurable

2018-07-30 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-11060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-11060:
--
Fix Version/s: (was: 3.1.1)
   3.1.2

> make DEFAULT_MAX_CORRUPT_FILEBLOCKS_RETURNED configurable
> -
>
> Key: HDFS-11060
> URL: https://issues.apache.org/jira/browse/HDFS-11060
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha3
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2
>
> Attachments: HDFS-11060.1.patch, HDFS-11060.2.patch
>
>
> Current, the easiest way to determine which blocks is missing is using NN web 
> UI or JMX. Unfortunately, because the 
> DEFAULT_MAX_CORRUPT_FILEBLOCKS_RETURNED=100 is hard code in FSNamesystem, 
> only 100 missing blocks can be return by UI and JMX. Even the result of URL 
> "https://nn:50070/fsck?listcorruptfileblocks=1&path=%2F"; is limited by this 
> hard code value too.
> I did know FSCK can return more than 100 result but due to the security 
> reason(with kerberos), it is very hard to involve to costumer programs and 
> scripts.
> So I think it should add a configurable var "maxCorruptFileBlocksReturned" to 
> fix above case.
> If community also think it's worth to do, I will patch this. If not, please 
> feel free to tell me the reason. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-07-30 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562301#comment-16562301
 ] 

Wangda Tan commented on HDFS-13448:
---

Updated fixed version to 3.1.2 given this don't exist in branch-3.1.1

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch, 
> HDFS-13448.12.patch, HDFS-13448.13.patch, HDFS-13448.14.patch, 
> HDFS-13448.6.patch, HDFS-13448.7.patch, HDFS-13448.8.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-07-30 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13448:
--
Fix Version/s: (was: 3.1.1)
   3.1.2

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch, 
> HDFS-13448.12.patch, HDFS-13448.13.patch, HDFS-13448.14.patch, 
> HDFS-13448.6.patch, HDFS-13448.7.patch, HDFS-13448.8.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2018-07-19 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549706#comment-16549706
 ] 

Wangda Tan commented on HDFS-13596:
---

Given there's no movement of this issue for 2 months and this is not a 
regression in 3.1.x, I just moved it to 3.1.2

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Zsolt Venczel
>Priority: Blocker
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(Name

[jira] [Updated] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x

2018-07-19 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13596:
--
Target Version/s: 3.2.0, 3.0.4, 3.1.2  (was: 3.2.0, 3.1.1, 3.0.4)

> NN restart fails after RollingUpgrade from 2.x to 3.x
> -
>
> Key: HDFS-13596
> URL: https://issues.apache.org/jira/browse/HDFS-13596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Zsolt Venczel
>Priority: Blocker
>
> After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails 
> while replaying edit logs.
>  * After NN is started with rollingUpgrade, the layoutVersion written to 
> editLogs (before finalizing the upgrade) is the pre-upgrade layout version 
> (so as to support downgrade).
>  * When writing transactions to log, NN writes as per the current layout 
> version. In 3.x, erasureCoding bits are added to the editLog transactions.
>  * So any edit log written after the upgrade and before finalizing the 
> upgrade will have the old layout version but the new format of transactions.
>  * When NN is restarted and the edit logs are replayed, the NN reads the old 
> layout version from the editLog file. When parsing the transactions, it 
> assumes that the transactions are also from the previous layout and hence 
> skips parsing the erasureCoding bits.
>  * This cascades into reading the wrong set of bits for other fields and 
> leads to NN shutting down.
> Sample error output:
> {code:java}
> java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected 
> length 16
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74)
>  at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86)
>  at 
> org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163)
>  at 
> org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
> 2018-05-17 19:10:06,522 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception 
> loading fsimage
> java.io.IOException: java.lang.IllegalStateException: Cannot skip to less 
> than the current value (=16389), where newValue=16388
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937)
>  at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)

[jira] [Commented] (HDFS-13176) WebHdfs file path gets truncated when having semicolon (;) inside

2018-05-31 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-13176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497138#comment-16497138
 ] 

Wangda Tan commented on HDFS-13176:
---

Thanks [~mackrorysd] for the notice.

I'm fine with either: if you think HDFS-13176 is important improvement, let's 
get it in to branch-3.1 with followup fix. If you think it is a less important 
change, please revert it in branch-3.1. You make the call :)

> WebHdfs file path gets truncated when having semicolon (;) inside
> -
>
> Key: HDFS-13176
> URL: https://issues.apache.org/jira/browse/HDFS-13176
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Fix For: 2.10.0, 3.2.0
>
> Attachments: HDFS-13176-branch-2.01.patch, 
> HDFS-13176-branch-2.03.patch, HDFS-13176-branch-2.03.patch, 
> HDFS-13176-branch-2.03.patch, HDFS-13176-branch-2.03.patch, 
> HDFS-13176-branch-2.03.patch, HDFS-13176-branch-2.04.patch, 
> HDFS-13176-branch-2_yetus.log, HDFS-13176.01.patch, HDFS-13176.02.patch, 
> TestWebHdfsUrl.testWebHdfsSpecialCharacterFile.patch
>
>
> Find attached a patch having a test case that tries to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13352) RBF: Add xsl stylesheet for hdfs-rbf-default.xml

2018-03-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13352:
--
Fix Version/s: (was: 3.1.1)

Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0)

> RBF: Add xsl stylesheet for hdfs-rbf-default.xml
> 
>
> Key: HDFS-13352
> URL: https://issues.apache.org/jira/browse/HDFS-13352
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-13352.1.patch
>
>
> {{configuration.xsl}} is required for browsing {{hdfs-rbf-default.xml}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12884) BlockUnderConstructionFeature.truncateBlock should be of type BlockInfo

2018-03-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12884:
--
Fix Version/s: (was: 3.1.1)

Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0)

> BlockUnderConstructionFeature.truncateBlock should be of type BlockInfo
> ---
>
> Key: HDFS-12884
> URL: https://issues.apache.org/jira/browse/HDFS-12884
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: chencan
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2, 3.2.0
>
> Attachments: HDFS-12884.001.patch, HDFS-12884.002.patch, 
> HDFS-12884.003.patch
>
>
> {{BlockUnderConstructionFeature.truncateBlock}} type should be changed to 
> {{BlockInfo}} from {{Block}}. {{truncateBlock}} is always assigned as 
> {{BlockInfo}}, so this will avoid unnecessary casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13204) RBF: Optimize name service safe mode icon

2018-03-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13204:
--
Fix Version/s: (was: 3.1.1)

Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0)

> RBF: Optimize name service safe mode icon
> -
>
> Key: HDFS-13204
> URL: https://issues.apache.org/jira/browse/HDFS-13204
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: liuhongtong
>Assignee: liuhongtong
>Priority: Minor
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-13204.001.patch, HDFS-13204.002.patch, 
> HDFS-13204.003.patch, HDFS-13204.004.patch, HDFS-13204.005.patch, 
> HDFS-13204.006.patch, HDFS-13204.007.patch, HDFS-13204.008.patch, 
> Routers.png, Subclusters.png, image-2018-02-28-18-33-09-972.png, 
> image-2018-02-28-18-33-47-661.png, image-2018-02-28-18-35-35-708.png, 
> image-2018-03-23-18-06-54-354.png, image-2018-03-26-10-10-10-930.png, 
> image-2018-03-26-10-21-24-171.png
>
>
> In federation health webpage, the safe mode icons of Subclusters and Routers 
> are inconsistent.
> The safe mode icon of Subclusters may induce users the name service is 
> maintaining.
> !image-2018-02-28-18-33-09-972.png!
> The safe mode icon of Routers:
> !image-2018-02-28-18-33-47-661.png!
> In fact, if the name service is in safe mode, users can't do writing related 
> operations. So I think the safe mode icon in Subclusters should be modified, 
> which may be more reasonable.
> !image-2018-02-28-18-35-35-708.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13195) DataNode conf page cannot display the current value after reconfig

2018-03-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13195:
--
Fix Version/s: (was: 3.1.1)

Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0)

> DataNode conf page  cannot display the current value after reconfig
> ---
>
> Key: HDFS-13195
> URL: https://issues.apache.org/jira/browse/HDFS-13195
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Minor
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2, 3.2.0
>
> Attachments: HDFS-13195-branch-2.7.001.patch, 
> HDFS-13195-branch-2.7.002.patch, HDFS-13195.001.patch, HDFS-13195.002.patch
>
>
> Now the branch-2.7 support dfs.datanode.data.dir reconfig, but after i 
> reconfig this key, the conf page's value is still the old config value.
> The reason is that:
> {code:java}
> public DatanodeHttpServer(final Configuration conf,
>   final DataNode datanode,
>   final ServerSocketChannel externalHttpChannel)
> throws IOException {
> this.conf = conf;
> Configuration confForInfoServer = new Configuration(conf);
> confForInfoServer.setInt(HttpServer2.HTTP_MAX_THREADS, 10);
> HttpServer2.Builder builder = new HttpServer2.Builder()
> .setName("datanode")
> .setConf(confForInfoServer)
> .setACL(new AccessControlList(conf.get(DFS_ADMIN, " ")))
> .hostName(getHostnameForSpnegoPrincipal(confForInfoServer))
> .addEndpoint(URI.create("http://localhost:0";))
> .setFindPort(true);
> this.infoServer = builder.build();
> {code}
> The confForInfoServer is a new configuration instance, while the dfsadmin 
> reconfig the datanode's config, the config result cannot reflect to 
> confForInfoServer, so we should use the datanode's conf.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12512) RBF: Add WebHDFS

2018-03-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12512:
--
Fix Version/s: (was: 3.1.1)

Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0)

> RBF: Add WebHDFS
> 
>
> Key: HDFS-12512
> URL: https://issues.apache.org/jira/browse/HDFS-12512
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Íñigo Goiri
>Assignee: Wei Yan
>Priority: Major
>  Labels: RBF
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-12512.000.patch, HDFS-12512.001.patch, 
> HDFS-12512.002.patch, HDFS-12512.003.patch, HDFS-12512.004.patch, 
> HDFS-12512.005.patch, HDFS-12512.006.patch, HDFS-12512.007.patch, 
> HDFS-12512.008.patch, HDFS-12512.009.patch, HDFS-12512.010.patch, 
> HDFS-12512.011.patch, HDFS-12512.012.patch, HDFS-12512.013.patch
>
>
> The Router currently does not support WebHDFS. It needs to implement 
> something similar to {{NamenodeWebHdfsMethods}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12792) RBF: Test Router-based federation using HDFSContract

2018-03-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12792:
--
Fix Version/s: (was: 3.1.1)

Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0)

> RBF: Test Router-based federation using HDFSContract
> 
>
> Key: HDFS-12792
> URL: https://issues.apache.org/jira/browse/HDFS-12792
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: RBF
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-12615.000.patch, HDFS-12792.001.patch, 
> HDFS-12792.002.patch, HDFS-12792.003.patch, HDFS-12792.004.patch, 
> HDFS-12792.005.patch, HDFS-12792.006.patch, HDFS-12792.007.patch, 
> HDFS-12792.008.patch, HDFS-12792.009.patch, HDFS-12792.010.patch, 
> HDFS-12792.011.patch, HDFS-12792.012.patch, HDFS-12792.013.patch
>
>
> Router-based federation should support HDFSContract.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13250) RBF: Router to manage requests across multiple subclusters

2018-03-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13250:
--
Fix Version/s: (was: 3.1.1)

Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0)

> RBF: Router to manage requests across multiple subclusters
> --
>
> Key: HDFS-13250
> URL: https://issues.apache.org/jira/browse/HDFS-13250
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-13250.000-addendum-branch-2.patch, 
> HDFS-13250.000.patch, HDFS-13250.001.patch, HDFS-13250.002.patch, 
> HDFS-13250.003.patch, HDFS-13250.004.patch, HDFS-13250.005.patch
>
>
> HDFS-13124 introduces the concept of mount points spanning multiple 
> subclusters. The Router should distribute the requests across these 
> subclusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13291) RBF: Implement available space based OrderResolver

2018-03-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13291:
--
Fix Version/s: (was: 3.1.1)

Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0)

> RBF: Implement available space based OrderResolver
> --
>
> Key: HDFS-13291
> URL: https://issues.apache.org/jira/browse/HDFS-13291
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-13291-branch-2.001.patch, HDFS-13291.001.patch, 
> HDFS-13291.002.patch, HDFS-13291.003.patch, HDFS-13291.004.patch, 
> HDFS-13291.005.patch, HDFS-13291.006.patch, HDFS-13291.007.patch, 
> HDFS-13291.008.patch, HDFS-13291.009.patch
>
>
> Implement available space based OrderResolver, this type resolver will 
> benefit for balancing the data across subclusters. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13318) RBF: Fix FindBugs in hadoop-hdfs-rbf

2018-03-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13318:
--
Fix Version/s: (was: 3.1.1)

Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0)

> RBF: Fix FindBugs in hadoop-hdfs-rbf
> 
>
> Key: HDFS-13318
> URL: https://issues.apache.org/jira/browse/HDFS-13318
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Ekanth S
>Priority: Minor
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-13318.001.patch
>
>
> hadoop-hdfs-rbf has 3 FindBug warnings:
> * NamenodePriorityComparator should be serializable
> * RemoteMethod.getTypes() may expose internal representation
> * RemoteMethod may store mutable objects



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13347) RBF: Cache datanode reports

2018-03-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13347:
--
Fix Version/s: (was: 3.1.1)

Doing 3.1.0 RC1 now, moved all 3.1.1 (branch-3.1) fixes to 3.1.0 (branch-3.1.0)

> RBF: Cache datanode reports
> ---
>
> Key: HDFS-13347
> URL: https://issues.apache.org/jira/browse/HDFS-13347
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-13347-branch-2.000.patch, HDFS-13347.000.patch, 
> HDFS-13347.001.patch, HDFS-13347.002.patch, HDFS-13347.003.patch, 
> HDFS-13347.004.patch, HDFS-13347.005.patch, HDFS-13347.006.patch
>
>
> Getting the datanode reports is an expensive operation and can be executed 
> very frequently by the UI and watchdogs. We should cache this information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-7877) [Umbrella] Support maintenance state for datanodes

2018-03-21 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-7877:
-
Summary: [Umbrella] Support maintenance state for datanodes  (was: Support 
maintenance state for datanodes)

> [Umbrella] Support maintenance state for datanodes
> --
>
> Key: HDFS-7877
> URL: https://issues.apache.org/jira/browse/HDFS-7877
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Ming Ma
>Assignee: Ming Ma
>Priority: Major
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: HDFS-7877-2.patch, HDFS-7877.patch, 
> Supportmaintenancestatefordatanodes-2.pdf, 
> Supportmaintenancestatefordatanodes.pdf
>
>
> This requirement came up during the design for HDFS-7541. Given this feature 
> is mostly independent of upgrade domain feature, it is better to track it 
> under a separate jira. The design and draft patch will be available soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12884) BlockUnderConstructionFeature.truncateBlock should be of type BlockInfo

2018-03-21 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409062#comment-16409062
 ] 

Wangda Tan commented on HDFS-12884:
---

[~shv], I just moved this to 3.1.1 since we're working on 3.1.0 on branch-3.1.0 
which doesn't have this Jira. Please let me know ur thoughts.

> BlockUnderConstructionFeature.truncateBlock should be of type BlockInfo
> ---
>
> Key: HDFS-12884
> URL: https://issues.apache.org/jira/browse/HDFS-12884
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: chencan
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2, 3.2.0, 3.1.1
>
> Attachments: HDFS-12884.001.patch, HDFS-12884.002.patch, 
> HDFS-12884.003.patch
>
>
> {{BlockUnderConstructionFeature.truncateBlock}} type should be changed to 
> {{BlockInfo}} from {{Block}}. {{truncateBlock}} is always assigned as 
> {{BlockInfo}}, so this will avoid unnecessary casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12884) BlockUnderConstructionFeature.truncateBlock should be of type BlockInfo

2018-03-21 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12884:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> BlockUnderConstructionFeature.truncateBlock should be of type BlockInfo
> ---
>
> Key: HDFS-12884
> URL: https://issues.apache.org/jira/browse/HDFS-12884
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: chencan
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2, 3.2.0, 3.1.1
>
> Attachments: HDFS-12884.001.patch, HDFS-12884.002.patch, 
> HDFS-12884.003.patch
>
>
> {{BlockUnderConstructionFeature.truncateBlock}} type should be changed to 
> {{BlockInfo}} from {{Block}}. {{truncateBlock}} is always assigned as 
> {{BlockInfo}}, so this will avoid unnecessary casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13230) RBF: ConnectionManager's cleanup task will compare each pool's own active conns with its total conns

2018-03-21 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409060#comment-16409060
 ] 

Wangda Tan commented on HDFS-13230:
---

[~elgoiri], while doing Jira scan, I found the committed message doesn't match 
this Jira ID: 
{code:java}
commit 0c2b969e0161a068bf9ae013c4b95508dfb90a8a
Author: Inigo Goiri 
Date: Thu Mar 8 09:32:05 2018 -0800

HDFS-13232. RBF: ConnectionManager's cleanup task will compare each pool's own 
active conns with its total conns. Contributed by Chao Sun.{code}
Posted it here so we can track this in the future.

> RBF: ConnectionManager's cleanup task will compare each pool's own active 
> conns with its total conns
> 
>
> Key: HDFS-13230
> URL: https://issues.apache.org/jira/browse/HDFS-13230
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Chao Sun
>Priority: Minor
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2
>
> Attachments: HDFS-13230.000.patch, HDFS-13230.001.patch
>
>
> In the cleanup task:
> {code:java}
> long timeSinceLastActive = Time.now() - pool.getLastActiveTime();
> int total = pool.getNumConnections();
> int active = getNumActiveConnections();
> if (timeSinceLastActive > connectionCleanupPeriodMs ||
> {code}
> the 3rd line should be pool.getNumActiveConnections()
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12859) Admin command resetBalancerBandwidth

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12859:
--
Fix Version/s: (was: 3.1.0)

> Admin command resetBalancerBandwidth
> 
>
> Key: HDFS-12859
> URL: https://issues.apache.org/jira/browse/HDFS-12859
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer & mover
>Reporter: Jianfei Jiang
>Priority: Major
> Attachments: 
> 0003-HDFS-12859-Admin-command-resetBalancerBandwidth.patch, 
> 0004-HDFS-12859-Admin-command-resetBalancerBandwidth.patch, HDFS-12859.patch
>
>
> We can already set balancer bandwidth dynamically using command 
> setBalancerBandwidth. The setting value is not persistent and not stored in 
> configuration file. The different datanodes could their different default or 
> former setting in configuration.
> When we suggested to develop a schedule balancer task which runs at midnight 
> everyday. We set a larger bandwidth for it and hope to reset the value after 
> finishing. However, we found it difficult to reset the different setting for 
> different datanodes as the setBalancerBandwidth command can only set the same 
> value to all datanodes. If we want to use unique setting for every datanode, 
> we have to reset the datanodes.
> So it would be useful to have a command to synchronize the setting with the 
> configuration file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12487) FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do the callers

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12487:
--
Fix Version/s: (was: 3.1.0)

> FsDatasetSpi.isValidBlock() lacks null pointer check inside and neither do 
> the callers
> --
>
> Key: HDFS-12487
> URL: https://issues.apache.org/jira/browse/HDFS-12487
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover, diskbalancer
>Affects Versions: 3.0.0
> Environment: CentOS 6.8 x64
> CPU:4 core
> Memory:16GB
> Hadoop: Release 3.0.0-alpha4
>Reporter: liumi
>Assignee: liumi
>Priority: Major
> Attachments: HDFS-12487.002.patch, HDFS-12487.003.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> BlockIteratorImpl.nextBlock() will look for the blocks in the source volume, 
> if there are no blocks any more, it will return null up to 
> DiskBalancer.getBlockToCopy(). However, the DiskBalancer.getBlockToCopy() 
> will check whether it's a valid block.
> When I look into the FsDatasetSpi.isValidBlock(), I find that it doesn't 
> check the null pointer! In fact, we firstly need to check whether it's null 
> or not, or exception will occur.
> This bug is hard to find, because the DiskBalancer hardly copy all the data 
> of one volume to others. Even if some times we may copy all the data of one 
> volume to other volumes, when the bug occurs, the copy process has already 
> done.
> However, when we try to copy all the data of two or more volumes to other 
> volumes in more than one step, the thread will be shut down, which is caused 
> by the bug above.
> The bug can fixed by two ways:
> 1)Before the call of FsDatasetSpi.isValidBlock(), we check the null pointer
> 2)Check the null pointer inside the implementation of 
> FsDatasetSpi.isValidBlock()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12700) Fix datanode link that can not be accessed in dfshealth.html

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12700:
--
Fix Version/s: (was: 3.1.0)

>  Fix datanode  link that can not be accessed in dfshealth.html
> --
>
> Key: HDFS-12700
> URL: https://issues.apache.org/jira/browse/HDFS-12700
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: fang zhenyi
>Assignee: fang zhenyi
>Priority: Minor
> Attachments: HDFS-12700.000.patch
>
>
>  I find that datanode  link that can not be accessed in dfshealth.html if I 
> do not change hosts file.So I changed the link to ip address.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12708) Fix hdfs haadmin usage

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12708:
--
Fix Version/s: (was: 3.1.0)

> Fix  hdfs haadmin usage
> ---
>
> Key: HDFS-12708
> URL: https://issues.apache.org/jira/browse/HDFS-12708
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha4
>Reporter: fang zhenyi
>Assignee: fang zhenyi
>Priority: Minor
> Attachments: HDFS-12708.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13136:
--
Fix Version/s: (was: 3.2.0)
   (was: 3.0.2)
   (was: 3.1.0)

> Avoid taking FSN lock while doing group member lookup for FSD permission check
> --
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13136-branch-3.0.001.patch, 
> HDFS-13136-branch-3.0.002.patch, HDFS-13136.001.patch, HDFS-13136.002.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12284) RBF: Support for Kerberos authentication

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12284:
--
Target Version/s: 3.2.0  (was: 3.1.0)

> RBF: Support for Kerberos authentication
> 
>
> Key: HDFS-12284
> URL: https://issues.apache.org/jira/browse/HDFS-12284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Zhe Zhang
>Assignee: Sherwood Zheng
>Priority: Major
> Fix For: HDFS-10467
>
>
> HDFS Router should support Kerberos authentication and issuing / managing 
> HDFS delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12257:
--
Target Version/s: 2.8.3, 2.9.1, 3.2.0  (was: 2.8.3, 3.1.0, 2.9.1)

> Expose getSnapshottableDirListing as a public API in HdfsAdmin
> --
>
> Key: HDFS-12257
> URL: https://issues.apache.org/jira/browse/HDFS-12257
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 2.6.5
>Reporter: Andrew Wang
>Assignee: Huafeng Wang
>Priority: Major
> Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, 
> HDFS-12257.003.patch
>
>
> Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no 
> programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we 
> should expose listing there as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8430) Erasure coding: compute file checksum for striped files (stripe by stripe)

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-8430:
-
Target Version/s: 3.2.0  (was: 3.1.0)

> Erasure coding: compute file checksum for striped files (stripe by stripe)
> --
>
> Key: HDFS-8430
> URL: https://issues.apache.org/jira/browse/HDFS-8430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: HDFS-7285
>Reporter: Walter Su
>Assignee: Kai Zheng
>Priority: Major
> Attachments: HDFS-8430-poc1.patch
>
>
> HADOOP-3981 introduces a  distributed file checksum algorithm. It's designed 
> for replicated block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped 
> block group.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11397) TestThrottledAsyncChecker#testCancellation timed out

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-11397:
--
Target Version/s: 3.2.0  (was: 3.1.0)

> TestThrottledAsyncChecker#testCancellation timed out
> 
>
> Key: HDFS-11397
> URL: https://issues.apache.org/jira/browse/HDFS-11397
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Affects Versions: 3.0.0-alpha4
>Reporter: John Zhuge
>Assignee: Manjunath Anand
>Priority: Minor
> Attachments: HDFS-11397-V01.patch
>
>
> {noformat}
> Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 61.153 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker
> testCancellation(org.apache.hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker)
>   Time elapsed: 60.033 sec  <<< ERROR!
> java.lang.Exception: test timed out after 6 milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> org.apache.hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker.testCancellation(TestThrottledAsyncChecker.java:114)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11885) createEncryptionZone should not block on initializing EDEK cache

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-11885:
--
Target Version/s: 2.8.3, 2.9.1, 3.2.0  (was: 2.8.3, 3.1.0, 2.9.1)

> createEncryptionZone should not block on initializing EDEK cache
> 
>
> Key: HDFS-11885
> URL: https://issues.apache.org/jira/browse/HDFS-11885
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.6.5
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Major
> Attachments: HDFS-11885.001.patch, HDFS-11885.002.patch, 
> HDFS-11885.003.patch, HDFS-11885.004.patch
>
>
> When creating an encryption zone, we call {{ensureKeyIsInitialized}}, which 
> calls {{provider.warmUpEncryptedKeys(keyName)}}. This is a blocking call, 
> which attempts to fill the key cache up to the low watermark.
> If the KMS is down or slow, this can take a very long time, and cause the 
> createZone RPC to fail with a timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11503) Integrate Chocolate Cloud RS coder implementation

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-11503:
--
Target Version/s: 3.2.0  (was: 3.1.0)

> Integrate Chocolate Cloud RS coder implementation
> -
>
> Key: HDFS-11503
> URL: https://issues.apache.org/jira/browse/HDFS-11503
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Marcell Feher
>Priority: Major
> Attachments: HDFS-11503.patch
>
>
> Quote from Marcell on HDFS-7285:
> First of all let me introduce ourselves: we are Chocolate Cloud from Denmark, 
> we use erasure coding to improve storage solutions. We already have 
> Reed-Solomon and Random Linear Network Coding backends for Liberasurecode, 
> and now we are at the final stage of developing our RS plugin to HDFS-EC. The 
> performance of our plugin is similar to ISA-L's, in some configurations we 
> are better, in others we are worse (our initial speed comparison charts can 
> be found here: https://www.chocolate-cloud.cc/Plugins/HDFS-EC/hdfs.html).
> We would like our plugin to become officially supported in Hadoop 3.0. We can 
> already provide a preliminary version of our (native) library and a patch 
> with the necessary glue code for the next alpha release.
> I'd like to know your thoughts about whether it's possible and how it could 
> be achieved.
> P.S: I'm happy to share more details if there's interest



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13289) RBF: TestConnectionManager#testCleanup() test case need correction

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13289:
--
Target Version/s: 2.10.0, 2.9.1, 3.0.2, 3.2.0  (was: 3.1.0, 2.10.0, 2.9.1, 
3.0.2, 3.2.0)

> RBF: TestConnectionManager#testCleanup() test case need correction
> --
>
> Key: HDFS-13289
> URL: https://issues.apache.org/jira/browse/HDFS-13289
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Dibyendu Karmakar
>Priority: Minor
>
> In TestConnectionManager#testCleanup() 
>  
> {code:java}
> // Make sure the number of connections doesn't go below minSize
> ConnectionPool pool3 = new ConnectionPool(
> conf, TEST_NN_ADDRESS, TEST_USER3, 2, 10);
> addConnectionsToPool(pool3, 10, 0);
> poolMap.put(new ConnectionPoolId(TEST_USER2, TEST_NN_ADDRESS), pool3);
> connManager.cleanup(pool3);
> checkPoolConnections(TEST_USER3, 2, 0);
> {code}
> this part need correction
> Here new ConnectionPoolId is created with TEST_USER2 but checkPoolConnections 
> is done using TEST_USER3. 
> In checkPoolConnections method 
> {code:java}
> if (e.getKey().getUgi() == ugi)
> {code}
> then only it will validate numOfConns and numOfActiveConns. In this case for 
> TEST_USER3  ' *if*  'condition is returning *false* and if you pass any value 
> to the checkPoolConnections method, the test case will pass.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12811) A cheaper, faster, less memory intensive Hadoop fsck

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12811:
--
Target Version/s: 3.2.0  (was: 3.1.0)

> A cheaper, faster, less memory intensive Hadoop fsck
> 
>
> Key: HDFS-12811
> URL: https://issues.apache.org/jira/browse/HDFS-12811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Major
>
> A cheaper, faster, less memory intensive, approach is eliminate the traversal 
> by directly scanning the inode table.
>  A side-effect is paths would be scanned and displayed in a random order. A 
> new option, ex. "-direct" or "-fast", would likely be required to avoid 
> compatibility issues.
> PS: This enhancement is valid only if the path is root directory {{/}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13051:
--
Target Version/s: 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2  (was: 3.1.0, 2.10.0, 
2.9.1, 2.8.4, 2.7.6, 3.0.2)

> dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
> --
>
> Key: HDFS-13051
> URL: https://issues.apache.org/jira/browse/HDFS-13051
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.5
>Reporter: zhangwei
>Assignee: Daryn Sharp
>Priority: Major
>  Labels: AsyncEditlog, deadlock
> Attachments: HDFS-13112.patch, deadlock.patch
>
>
> when doing rolleditlog it acquires  fs write lock，then acquire FSEditLogAsync 
> lock object，and write 3 EDIT(the second one override logEdit method and 
> return true)
> in extremely case，when FSEditLogAsync's logSync is very 
> slow，editPendingQ(default size 4096)is full，it case IPC thread can not offer 
> edit object into editPendingQ when doing rolleditlog，it block on editPendingQ 
> .put  method，however it does't release FSEditLogAsync object lock, and 
> edit.logEdit method in FSEditLogAsync.run thread can never acquire 
> FSEditLogAsync object lock, it case dead lock
> stack trace like below
> "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 
> tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7fbd3cb96f58> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>  at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635)
>  at 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395)
>  - locked <0x7fbd3cbae500> (a java.lang.Object)
>  at 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656)
>  at java.lang.Thread.run(Thread.java:745)
> "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 
> nid=0x13fbf waiting for monitor entry [0x7fb32c51a000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443)
>  - waiting to lock <*0x7fbcbc131000*> (a 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177)
>  at java.lang.Thread.run(Thread.java:745)
> "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 
> tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7fbd3cb96f58> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>  at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251)
>  - locked <*0x7fbcbc131000*> (a 
> org.apache.hadoop.hdfs.server.n

[jira] [Updated] (HDFS-13291) RBF: Implement available space based OrderResolver

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13291:
--
Target Version/s: 2.9.0, 3.2.0  (was: 2.9.0, 3.1.0)

> RBF: Implement available space based OrderResolver
> --
>
> Key: HDFS-13291
> URL: https://issues.apache.org/jira/browse/HDFS-13291
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13291.001.patch, HDFS-13291.002.patch
>
>
> Implement available space based OrderResolver, this type resolver will 
> benefit for balancing the data across subclusters. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13314) NameNode should optionally exit if it detects FsImage corruption

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13314:
--
Target Version/s: 2.10.0, 3.2.0  (was: 3.1.0, 2.10.0)

> NameNode should optionally exit if it detects FsImage corruption
> 
>
> Key: HDFS-13314
> URL: https://issues.apache.org/jira/browse/HDFS-13314
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
> Attachments: HDFS-13314.01.patch, HDFS-13314.02.patch
>
>
> The NameNode should optionally exit after writing an FsImage if it detects 
> the following kinds of corruptions:
> # INodeReference pointing to non-existent INode
> # Duplicate entries in snapshot deleted diff list.
> This behavior is controlled via an undocumented configuration setting, and 
> disabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-03-20 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13174:
--
Target Version/s: 2.9.1, 3.0.1, 2.8.4, 2.7.6  (was: 3.1.0, 2.9.1, 3.0.1, 
2.8.4, 2.7.6)

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, after 20 minutes Mover will stop with the 
> following exception reported to the console (lines might differ as this 
> exception came from a CDH5.12.1 installation):
> java.io.IOException: Block move timed out
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13252) Code refactoring: Remove Diff.ListType

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13252:
--
Fix Version/s: 3.1.0

> Code refactoring: Remove Diff.ListType
> --
>
> Key: HDFS-13252
> URL: https://issues.apache.org/jira/browse/HDFS-13252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: h13252_20170308.patch, h13252_20170309.patch
>
>
> In Diff, there are only two lists, created and deleted.  It is easier to 
> trace the code if the methods have the list type in the method name, instead 
> of passing a ListType parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13227) Add a method to calculate cumulative diff over multiple snapshots in DirectoryDiffList

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13227:
--
Fix Version/s: 3.1.0

> Add a method to  calculate cumulative diff over multiple snapshots in 
> DirectoryDiffList
> ---
>
> Key: HDFS-13227
> URL: https://issues.apache.org/jira/browse/HDFS-13227
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13227.001.patch
>
>
> This Jira proposes to add an API in DirectoryWithSnapshotFeature#f 
> DirectoryDiffList which will return minimal list of diffs needed to combine 
> to get the cumulative diff between two given snapshots. The same method will 
> be made use while constructing the childrenList for a directory. 
> DirectoryWithSnapshotFeature#getChildrenList and 
> DirectoryWithSnapshotFeature#computeDiffBetweenSnapshots will make use of the 
> same method to get the cumulative diff. When snapshotSkipList, with minimal 
> set of diffs to combine in order to get the cumulative diff, the overall 
> computation will be faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13130) Log object instance get incorrectly in SlowDiskTracker

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13130:
--
Fix Version/s: 3.1.0

> Log object instance get incorrectly in SlowDiskTracker
> --
>
> Key: HDFS-13130
> URL: https://issues.apache.org/jira/browse/HDFS-13130
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Minor
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13130.patch
>
>
> In class org.apache.hadoop.hdfs.server.blockmanagement.*SlowDiskTracker*, the 
> LOG is targeted to *SlowPeerTracker*.class incorrectly.
> {code:java}
> public class SlowDiskTracker {
>  public static final Logger LOG =
>  LoggerFactory.getLogger(SlowPeerTracker.class);{code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13246) FileInputStream redundant closes in readReplicasFromCache

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13246:
--
Fix Version/s: 3.1.0

> FileInputStream redundant closes in readReplicasFromCache 
> --
>
> Key: HDFS-13246
> URL: https://issues.apache.org/jira/browse/HDFS-13246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: liaoyuxiangqin
>Assignee: liaoyuxiangqin
>Priority: Minor
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13246.001.patch
>
>
> When i read the readReplicasFromCache() of BlockPoolSlice class in datanode, 
> I found the following code closes fileinputstream redundant, I think  
> IOUtils.closeStream(inputStream) in finally code block could guarantee close 
> the inputStream correctly, So the
> inputStream.close() can remove. Thanks.
>  
> {code:java|title=BlockPoolSlice.java|borderStyle=solid}
> FileInputStream inputStream = null;
> try {
>   inputStream = fileIoProvider.getFileInputStream(volume, replicaFile);
>   BlockListAsLongs blocksList =
>   BlockListAsLongs.readFrom(inputStream, maxDataLength);
>   if (blocksList == null) {
> return false;
>   }
>   for (BlockReportReplica replica : blocksList) {
> switch (replica.getState()) {
> case FINALIZED:
>   addReplicaToReplicasMap(replica, tmpReplicaMap, 
> lazyWriteReplicaMap, true);
>   break;
> case RUR:
> case RBW:
> case RWR:
>   addReplicaToReplicasMap(replica, tmpReplicaMap, 
> lazyWriteReplicaMap, false);
>   break;
> default:
>   break;
> }
>   }
>   inputStream.close();
>   // Now it is safe to add the replica into volumeMap
>   // In case of any exception during parsing this cache file, fall back
>   // to scan all the files on disk.
>   for (Iterator iter =
>   tmpReplicaMap.replicas(bpid).iterator(); iter.hasNext(); ) {
> ReplicaInfo info = iter.next();
> // We use a lightweight GSet to store replicaInfo, we need to remove
> // it from one GSet before adding to another.
> iter.remove();
> volumeMap.add(bpid, info);
>   }
>   LOG.info("Successfully read replica from cache file : "
>   + replicaFile.getPath());
>   return true;
> } catch (Exception e) {
>   // Any exception we need to revert back to read from disk
>   // Log the error and return false
>   LOG.info("Exception occurred while reading the replicas cache file: "
>   + replicaFile.getPath(), e );
>   return false;
> }
> finally {
>   if (!fileIoProvider.delete(volume, replicaFile)) {
> LOG.info("Failed to delete replica cache file: " +
> replicaFile.getPath());
>   }
>   // close the inputStream
>   IOUtils.closeStream(inputStream);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13211) Fix a bug in DirectoryDiffList.getMinListForRange

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13211:
--
Fix Version/s: 3.1.0

> Fix a bug in DirectoryDiffList.getMinListForRange
> -
>
> Key: HDFS-13211
> URL: https://issues.apache.org/jira/browse/HDFS-13211
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13211.001.patch, HDFS-13211.002.patch
>
>
> HDFS-13102 implements the DiffList interface for storing Directory Diffs 
> using SkipList.
> This Jira proposes to refactor the unit tests for HDFS-13102.
> We also have found a bug in DirectoryDiffList.getMinListForRange by the new 
> tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12780) Fix spelling mistake in DistCpUtils.java

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12780:
--
Fix Version/s: 3.1.0

> Fix spelling mistake in DistCpUtils.java
> 
>
> Key: HDFS-12780
> URL: https://issues.apache.org/jira/browse/HDFS-12780
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-beta1
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Trivial
>  Labels: patch
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-12780.patch
>
>
> We found a spelling mistake in DistCpUtils.java.  "* If checksums's can't be 
> retrieved," should be " * If checksums can't be retrieved,"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13171) Handle Deletion of nodes in SnasphotSkipList

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13171:
--
Fix Version/s: 3.1.0

> Handle Deletion of nodes in SnasphotSkipList
> 
>
> Key: HDFS-13171
> URL: https://issues.apache.org/jira/browse/HDFS-13171
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13171.000.patch, HDFS-13171.001.patch, 
> HDFS-13171.002.patch, HDFS-13171.003.patch
>
>
> This Jira will handle deletion of skipListNodes from DirectoryDiffList . If a 
> node has multiple levels, the list needs to be balanced .If the node is uni 
> level, no balancing of the list is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13202) Fix the outdated javadocs in HAUtil

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13202:
--
Fix Version/s: 3.1.0

> Fix the outdated javadocs in HAUtil
> ---
>
> Key: HDFS-13202
> URL: https://issues.apache.org/jira/browse/HDFS-13202
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Trivial
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13202.000.patch
>
>
> There are a few outdated javadocs in {{HAUtil}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13223) Reduce DiffListBySkipList memory usage

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13223:
--
Fix Version/s: 3.1.0

> Reduce DiffListBySkipList memory usage
> --
>
> Key: HDFS-13223
> URL: https://issues.apache.org/jira/browse/HDFS-13223
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13223.001.patch, HDFS-13223.002.patch, 
> HDFS-13223.003.patch, HDFS-13223.004.patch, HDFS-13223.004_commit.patch
>
>
> There are several ways to reduce memory footprint of DiffListBySkipList.
> - Move maxSkipLevels and skipInterval to DirectoryDiffListFactory.
> - Use an array for skipDiffList instead of List.
> - Do not store the level 0 element in skipDiffList.
> - Do not create new ChildrenDiff for the same value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13168:
--
Fix Version/s: 3.1.0

> XmlImageVisitor - Prefer Array over LinkedList
> --
>
> Key: HDFS-13168
> URL: https://issues.apache.org/jira/browse/HDFS-13168
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch
>
>
> {{ArrayDeque}}
> {quote}This class is likely to be faster than Stack when used as a stack, and 
> faster than LinkedList when used as a queue.{quote}
> .. not to mention less memory fragmentation (single backing array v.s. many 
> ArrayList nodes).
> https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-1686:
-
Fix Version/s: 3.1.0

> Federation: Add more Balancer tests with federation setting
> ---
>
> Key: HDFS-1686
> URL: https://issues.apache.org/jira/browse/HDFS-1686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Bharat Viswanadham
>Priority: Minor
> Fix For: 3.1.0, 3.2.0
>
> Attachments: 4358946.patch, HDFS-1686.00.patch, HDFS-1686.01.patch, 
> HDFS-1686.02.patch, h1686_20110303.patch
>
>
> A test with 3 Namenodes and 4 Datanodes in startup, and then adding 2 new 
> Datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13102:
--
Fix Version/s: 3.1.0

> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch, HDFS-13102.004.patch, HDFS-13102.005.patch, 
> HDFS-13102.006.patch, HDFS-13102.007.patch, HDFS-13102.008.patch, 
> HDFS-13102.009.patch, HDFS-13102.009_committed.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13193) Various Improvements for BlockTokenSecretManager

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13193:
--
Fix Version/s: 3.1.0

> Various Improvements for BlockTokenSecretManager
> 
>
> Key: HDFS-13193
> URL: https://issues.apache.org/jira/browse/HDFS-13193
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13193.1.patch
>
>
> Various improvements for class 
> {{org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.java}}
> # Remove superfluous {{toString}} calls
> # Fix some checkstyle warnings
> # Re-implement method with O(N^2) with HashMultiSet - also improves 
> readability
> # Increase code re-use with Apache Commons Library



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13222) Update getBlocks method to take minBlockSize in RPC calls

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13222:
--
Fix Version/s: 3.1.0

> Update getBlocks method to take minBlockSize in RPC calls
> -
>
> Key: HDFS-13222
> URL: https://issues.apache.org/jira/browse/HDFS-13222
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13222.00.patch, HDFS-13222.01.patch, 
> HDFS-13222.02.patch
>
>
>  
> getBlocks Using balancer parameter is done in this Jira HDFS-9412
>  
> Pass the Balancer conf value from Balancer to NN via getBlocks in each RPC. 
> as [~szetszwo] suggested
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13173) Replace ArrayList with DirectoryDiffList(SnapshotSkipList) to store DirectoryDiffs

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13173:
--
Fix Version/s: 3.1.0

> Replace ArrayList with DirectoryDiffList(SnapshotSkipList) to store 
> DirectoryDiffs
> --
>
> Key: HDFS-13173
> URL: https://issues.apache.org/jira/browse/HDFS-13173
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13173.001.patch, HDFS-13173.002.patch, 
> HDFS-13173.003.patch
>
>
> This Jira will replace the existing ArrayList with DirectoryDiffList to store 
> directory diffs for snapshots based on the config value of maxSkipLevels. If 
> the config specified is set to greater than 0, SnapshotSkipList will be used 
> to store DirectoryDiffs  else ArrayList will be used. The config value is set 
> to 0 by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13235) DiskBalancer: Update Documentation to add newly added options

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13235:
--
Fix Version/s: 3.1.0

> DiskBalancer: Update Documentation to add newly added options
> -
>
> Key: HDFS-13235
> URL: https://issues.apache.org/jira/browse/HDFS-13235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer, documentation
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13235.00.patch, HDFS-13235.01.patch
>
>
> HDFS-13181 added dfs.disk.balancer.plan.valid.interval
> HDFS-13178 added skipDateCheck Option



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13188) Disk Balancer: Support multiple block pools during block move

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13188:
--
Fix Version/s: 3.1.0

> Disk Balancer: Support multiple block pools during block move
> -
>
> Key: HDFS-13188
> URL: https://issues.apache.org/jira/browse/HDFS-13188
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13188.01.patch, HDFS-13188.02.patch, 
> HDFS-13188.03.patch, HDFS-13188.04.patch, HDFS-13188.05.patch
>
>
> During execute plan:
> *Federated setup:*
> When multiple block pools are there, it will only copy from blocks from first 
> block pool to destination disk, when balancing.
> We want to distribute the blocks from all block pools on source disk to 
> destination disk during balancing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13159) TestTruncateQuotaUpdate fails in trunk

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13159:
--
Fix Version/s: 3.1.0

> TestTruncateQuotaUpdate fails in trunk
> --
>
> Key: HDFS-13159
> URL: https://issues.apache.org/jira/browse/HDFS-13159
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13159.000.patch, HDFS-13159.001.patch
>
>
> Details in comment below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13210) Fix the typo in MiniDFSCluster class

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13210:
--
Fix Version/s: 3.1.0

> Fix the typo in MiniDFSCluster class 
> -
>
> Key: HDFS-13210
> URL: https://issues.apache.org/jira/browse/HDFS-13210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Yiqun Lin
>Assignee: fang zhenyi
>Priority: Trivial
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13210.001.patch, HDFS-13210.002.patch, 
> HDFS-13210.003.patch
>
>
> There is a typo {{SimilatedFSDataset}} in {{MiniDFSCluster#injectBlocks}}.
>  In line2748 and line2769:
> {code:java}
> public void injectBlocks(int dataNodeIndex,
>   Iterable blocksToInject, String bpid) throws IOException {
> if (dataNodeIndex < 0 || dataNodeIndex > dataNodes.size()) {
>   throw new IndexOutOfBoundsException();
> }
> final DataNode dn = dataNodes.get(dataNodeIndex).datanode;
> final FsDatasetSpi dataSet = DataNodeTestUtils.getFSDataset(dn);
> if (!(dataSet instanceof SimulatedFSDataset)) {
>   throw new IOException("injectBlocks is valid only for 
> SimilatedFSDataset");
> }
> ...
> }
> {code}
> {{SimilatedFSDataset}} should be {{SimulatedFSDataset}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13192) Change the code order in getFileEncryptionInfo to avoid unnecessary call of assignment

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13192:
--
Fix Version/s: 3.1.0

> Change the code order in getFileEncryptionInfo to avoid unnecessary call of 
> assignment
> --
>
> Key: HDFS-13192
> URL: https://issues.apache.org/jira/browse/HDFS-13192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption
>Affects Versions: 3.1.0
>Reporter: LiXin Ge
>Assignee: LiXin Ge
>Priority: Minor
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13192.001.patch, HDFS-13192.002.patch, 
> HDFS-13192.003.patch, org.apache.hadoop.hdfs.TestEncryptionZones-output.txt
>
>
> The assignment of {{version,suite}} and {{keyName}} should happen lazily, 
> right before it's used in case the {{fileXAttr}} is *null*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13178) Disk Balancer: Add skipDateCheck option to DiskBalancer Execute command

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13178:
--
Fix Version/s: 3.1.0

> Disk Balancer: Add skipDateCheck option to DiskBalancer Execute command
> ---
>
> Key: HDFS-13178
> URL: https://issues.apache.org/jira/browse/HDFS-13178
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13178.00.patch, HDFS-13178.01.patch, 
> HDFS-13178.02.patch
>
>
>  
> Add a force option to DiskBalancer Execute command, which is used for skip 
> date check and force execute the plan.
> This is one of the TODO for diskbalancer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13257) Code cleanup: INode never throws QuotaExceededException

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13257:
--
Fix Version/s: 3.1.0

> Code cleanup: INode never throws QuotaExceededException
> ---
>
> Key: HDFS-13257
> URL: https://issues.apache.org/jira/browse/HDFS-13257
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: h13257_20170309.patch, h13257_20170309b.patch
>
>
> The quota verification logic is changed in a way that INode never throws 
> QuotaExceededException.  The {{verify}} parameter is always false in 
> addSpaceConsumed(..) and addSpaceConsumed2Parent(..).
> This provide an opportunity for some code cleanup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13167) DatanodeAdminManager Improvements

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13167:
--
Fix Version/s: 3.1.0

> DatanodeAdminManager Improvements
> -
>
> Key: HDFS-13167
> URL: https://issues.apache.org/jira/browse/HDFS-13167
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, 
> HDFS-13167.3.patch
>
>
> # Use Collection type Set instead of List for tracking nodes
> # Fix logging statements that are erroneously appending variables instead of 
> using parameters
> # Miscellaneous small improvements
> As an example, the {{node}} variable is being appended to the string instead 
> of being passed as an argument to the {{trace}} method for variable 
> substitution.
> {code}
> LOG.trace("stopDecommission: Node {} in {}, nothing to do." +
>   node, node.getAdminState());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-336) dfsadmin -report should report number of blocks from datanode

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-336:

Fix Version/s: 3.1.0

> dfsadmin -report should report number of blocks from datanode
> -
>
> Key: HDFS-336
> URL: https://issues.apache.org/jira/browse/HDFS-336
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lohit Vijayarenu
>Assignee: Bharat Viswanadham
>Priority: Minor
>  Labels: newbie
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-336.00.patch, HDFS-336.01.patch, HDFS-336.02.patch
>
>
> _hadoop dfsadmin -report_ seems to miss number of blocks from a datanode. 
> Number of blocks hosted by a datanode is a good info which should be included 
> in the report. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10803) TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently due to no free space available

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-10803:
--
Fix Version/s: 3.1.0

> TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails 
> intermittently due to no free space available
> 
>
> Key: HDFS-10803
> URL: https://issues.apache.org/jira/browse/HDFS-10803
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.2.0
>
> Attachments: HDFS-10803.001.patch
>
>
> The test {{TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools}} 
> fails intermittently. The stack 
> infos(https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithMultipleNameNodes/testBalancing2OutOf3Blockpools/):
> {code}
> java.io.IOException: Creating block, no free space available
>   at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset$BInfo.(SimulatedFSDataset.java:151)
>   at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.injectBlocks(SimulatedFSDataset.java:580)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.injectBlocks(MiniDFSCluster.java:2679)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.unevenDistribution(TestBalancerWithMultipleNameNodes.java:405)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancing2OutOf3Blockpools(TestBalancerWithMultipleNameNodes.java:516)
> {code}
> The error message means that the datanode's capacity has used up and there is 
> no other space to create a new file block. 
> I looked into the code, I found the main reason seemed that the 
> {{capacities}}  for cluster is not correctly constructed in the second 
> cluster startup before preparing to redistribute blocks in test.
> The related code:
> {code}
>   // Here we do redistribute blocks nNameNodes times for each node,
>   // we need to adjust the capacities. Otherwise it will cause the no 
>   // free space errors sometimes.
>   final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
>   .nnTopology(MiniDFSNNTopology.simpleFederatedTopology(nNameNodes))
>   .numDataNodes(nDataNodes)
>   .racks(racks)
>   .simulatedCapacities(newCapacities)
>   .format(false)
>   .build();
>   LOG.info("UNEVEN 11");
> ...
> for(int n = 0; n < nNameNodes; n++) {
>   // redistribute blocks
>   final Block[][] blocksDN = TestBalancer.distributeBlocks(
>   blocks[n], s.replication, distributionPerNN);
> 
>   for(int d = 0; d < blocksDN.length; d++)
> cluster.injectBlocks(n, d, Arrays.asList(blocksDN[d]));
>   LOG.info("UNEVEN 13: n=" + n);
> }
> {code}
> And that means the totalUsed value has been increased as 
> {{nNameNodes*usedSpacePerNN}} rather than {{usedSpacePerNN}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs

2018-03-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-13142:
--
Fix Version/s: 3.1.0

> Define and Implement a DiifList Interface to store and manage SnapshotDiffs
> ---
>
> Key: HDFS-13142
> URL: https://issues.apache.org/jira/browse/HDFS-13142
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
> Attachments: HDFS-13142.001.patch, HDFS-13142.002.patch, 
> HDFS-13142.003.patch
>
>
> The InodeDiffList class contains a generic List to store snapshotDiffs. The 
> generic List interface is bulky and to store and manage snapshotDiffs, we 
> need only a few specific methods. 
> This Jira proposes to define a new interface called DiffList interface which 
> will be used to store and manage snapshotDiffs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12999) When reach the end of the block group, it may not need to flush all the data packets(flushAllInternals) twice.

2018-03-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390704#comment-16390704
 ] 

Wangda Tan commented on HDFS-12999:
---

[~figo], removed fix version, it should be only set by committer when the patch 
got committed. 

> When reach the end of the block group, it may not need to flush all the data 
> packets(flushAllInternals) twice. 
> ---
>
> Key: HDFS-12999
> URL: https://issues.apache.org/jira/browse/HDFS-12999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, hdfs-client
>Affects Versions: 3.0.0-beta1, 3.1.0
>Reporter: lufei
>Assignee: lufei
>Priority: Major
> Attachments: HDFS-12999.001.patch, HDFS-12999.002.patch
>
>
> In order to make the process simplification. It's no need to flush all the 
> data packets(flushAllInternals) twice,when reach the end of the block group.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12999) When reach the end of the block group, it may not need to flush all the data packets(flushAllInternals) twice.

2018-03-07 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12999:
--
Fix Version/s: (was: 3.1.0)

> When reach the end of the block group, it may not need to flush all the data 
> packets(flushAllInternals) twice. 
> ---
>
> Key: HDFS-12999
> URL: https://issues.apache.org/jira/browse/HDFS-12999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, hdfs-client
>Affects Versions: 3.0.0-beta1, 3.1.0
>Reporter: lufei
>Assignee: lufei
>Priority: Major
> Attachments: HDFS-12999.001.patch, HDFS-12999.002.patch
>
>
> In order to make the process simplification. It's no need to flush all the 
> data packets(flushAllInternals) twice,when reach the end of the block group.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade

2018-02-15 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-8893:
-
Target Version/s: 3.2.0  (was: 3.1.0)

> DNs with failed volumes stop serving during rolling upgrade
> ---
>
> Key: HDFS-8893
> URL: https://issues.apache.org/jira/browse/HDFS-8893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Daryn Sharp
>Priority: Critical
>
> When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker 
> to each of their volumes. If one of the volumes is bad, this will fail. When 
> this failure happens, the DN does not update the key it received from the NN.
> Unfortunately we had one failed volume on all the 3 datanodes which were 
> having replica.
> Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the 
> DNs with failed volumes will stop serving clients.
> Here is the stack trace on the datanode size:
> {noformat}
> 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN 
> datanode.DataNode: IOException in offerService
> java.io.IOException: Read-only file system
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:947)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365087#comment-16365087
 ] 

Wangda Tan commented on HDFS-12452:
---

[~xyao], if this patch can be done by this week, please commit to branch-3.1 as 
well. Thanks.

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
>Priority: Critical
>  Labels: flaky-test
> Attachments: HDFS-12452.001.patch, HDFS-12452.002.patch
>
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11310) Reduce the performance impact of the balancer (trunk port)

2018-02-13 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-11310:
--
Target Version/s: 3.2.0  (was: 3.1.0)

> Reduce the performance impact of the balancer (trunk port)
> --
>
> Key: HDFS-11310
> URL: https://issues.apache.org/jira/browse/HDFS-11310
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Daryn Sharp
>Priority: Critical
>
> HDFS-7967 introduced a highly performant balancer getBlocks() query that 
> scales to large/dense clusters.  The simple design implementation depends on 
> the triplets data structure.  HDFS-9260 removed the triplets which 
> fundamentally changes the implementation.  Either that patch must be reverted 
> or the getBlocks() patch needs reimplementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12049) Recommissioning live nodes stalls the NN

2018-02-13 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12049:
--
Target Version/s: 3.2.0  (was: 3.1.0)

> Recommissioning live nodes stalls the NN
> 
>
> Key: HDFS-12049
> URL: https://issues.apache.org/jira/browse/HDFS-12049
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> A node refresh will recommission included nodes that are alive and in 
> decommissioning or decommissioned state.  The recommission will scan all 
> blocks on the node, find over replicated blocks, chose an excess, queue an 
> invalidate.
> The process is expensive and worsened by overhead of storage types (even when 
> not in use).  It can be especially devastating because the write lock is held 
> for the entire node refresh.  _Recommissioning 67 nodes with ~500k 
> blocks/node stalled rpc services for over 4 mins._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-13 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12452:
--
Target Version/s: 3.2.0  (was: 3.1.0)

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Priority: Critical
>  Labels: flaky-test
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12548) HDFS Jenkins build is unstable on branch-2

2018-02-13 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12548:
--
Target Version/s: 3.2.0

> HDFS Jenkins build is unstable on branch-2
> --
>
> Key: HDFS-12548
> URL: https://issues.apache.org/jira/browse/HDFS-12548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.9.0
>Reporter: Rushabh S Shah
>Priority: Critical
>
> Feel free move the ticket to another project (e.g. infra).
> Recently I attached branch-2 patch while working on one jira 
> [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> There were at-least 100 failed and timed out tests. I am sure they are not 
> related to my patch.
> Also I came across another jira which was just a javadoc related change and 
> there were around 100 failed tests.
> Below are the details for pre-commits that failed in branch-2
> 1 [HDFS-12386 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069]
> {noformat}
> Ran on slave: asf912.gq1.ygridcore.net/H12
> Failed with following error message:
> Build timed out (after 300 minutes). Marking the build as aborted.
> Build was aborted
> Performing Post build task...
> {noformat}
> 2. [HDFS-12386 attempt 
> 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> {noformat}
> Ran on slave: asf900.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
> Caused: java.io.IOException: Backing channel 'H0' is disconnected.
>   at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
>   at 
> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
>   at com.sun.proxy.$Proxy125.isAlive(Unknown Source)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
>   at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
>   at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
>   at hudson.model.Build$BuildExecution.build(Build.java:206)
>   at hudson.model.Build$BuildExecution.doRun(Build.java:163)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490)
>   at hudson.model.Run.execute(Run.java:1735)
>   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>   at hudson.model.ResourceController.execute(ResourceController.java:97)
>   at hudson.model.Executor.run(Executor.java:405)
> {noformat}
> 3. [HDFS-12531 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493]
> {noformat}
> Ran on slave:  asf911.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCommandTrans

[jira] [Updated] (HDFS-12548) HDFS Jenkins build is unstable on branch-2

2018-02-13 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-12548:
--
Target Version/s:   (was: 3.1.0)

> HDFS Jenkins build is unstable on branch-2
> --
>
> Key: HDFS-12548
> URL: https://issues.apache.org/jira/browse/HDFS-12548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.9.0
>Reporter: Rushabh S Shah
>Priority: Critical
>
> Feel free move the ticket to another project (e.g. infra).
> Recently I attached branch-2 patch while working on one jira 
> [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> There were at-least 100 failed and timed out tests. I am sure they are not 
> related to my patch.
> Also I came across another jira which was just a javadoc related change and 
> there were around 100 failed tests.
> Below are the details for pre-commits that failed in branch-2
> 1 [HDFS-12386 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069]
> {noformat}
> Ran on slave: asf912.gq1.ygridcore.net/H12
> Failed with following error message:
> Build timed out (after 300 minutes). Marking the build as aborted.
> Build was aborted
> Performing Post build task...
> {noformat}
> 2. [HDFS-12386 attempt 
> 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> {noformat}
> Ran on slave: asf900.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
> Caused: java.io.IOException: Backing channel 'H0' is disconnected.
>   at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
>   at 
> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
>   at com.sun.proxy.$Proxy125.isAlive(Unknown Source)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
>   at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
>   at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
>   at hudson.model.Build$BuildExecution.build(Build.java:206)
>   at hudson.model.Build$BuildExecution.doRun(Build.java:163)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490)
>   at hudson.model.Run.execute(Run.java:1735)
>   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>   at hudson.model.ResourceController.execute(ResourceController.java:97)
>   at hudson.model.Executor.run(Executor.java:405)
> {noformat}
> 3. [HDFS-12531 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493]
> {noformat}
> Ran on slave:  asf911.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCom

[jira] [Commented] (HDFS-12548) HDFS Jenkins build is unstable on branch-2

2018-02-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353291#comment-16353291
 ] 

Wangda Tan commented on HDFS-12548:
---

We plan to start merge vote of 3.1.0 on Feb 18, please let me know if any plan 
to finish this by Feb 18 or we need to move it to 3.2.0.

> HDFS Jenkins build is unstable on branch-2
> --
>
> Key: HDFS-12548
> URL: https://issues.apache.org/jira/browse/HDFS-12548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.9.0
>Reporter: Rushabh S Shah
>Priority: Critical
>
> Feel free move the ticket to another project (e.g. infra).
> Recently I attached branch-2 patch while working on one jira 
> [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> There were at-least 100 failed and timed out tests. I am sure they are not 
> related to my patch.
> Also I came across another jira which was just a javadoc related change and 
> there were around 100 failed tests.
> Below are the details for pre-commits that failed in branch-2
> 1 [HDFS-12386 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069]
> {noformat}
> Ran on slave: asf912.gq1.ygridcore.net/H12
> Failed with following error message:
> Build timed out (after 300 minutes). Marking the build as aborted.
> Build was aborted
> Performing Post build task...
> {noformat}
> 2. [HDFS-12386 attempt 
> 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> {noformat}
> Ran on slave: asf900.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
> Caused: java.io.IOException: Backing channel 'H0' is disconnected.
>   at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
>   at 
> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
>   at com.sun.proxy.$Proxy125.isAlive(Unknown Source)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
>   at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
>   at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
>   at hudson.model.Build$BuildExecution.build(Build.java:206)
>   at hudson.model.Build$BuildExecution.doRun(Build.java:163)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490)
>   at hudson.model.Run.execute(Run.java:1735)
>   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>   at hudson.model.ResourceController.execute(ResourceController.java:97)
>   at hudson.model.Executor.run(Executor.java:405)
> {noformat}
> 3. [HDFS-12531 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493]
> {noformat}
> Ran on slave:  asf911.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remot

[jira] [Commented] (HDFS-12049) Recommissioning live nodes stalls the NN

2018-02-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353288#comment-16353288
 ] 

Wangda Tan commented on HDFS-12049:
---

We plan to start merge vote of 3.1.0 on Feb 18, please let me know if any plan 
to finish this by Feb 18 or we need to move it to 3.2.0.

> Recommissioning live nodes stalls the NN
> 
>
> Key: HDFS-12049
> URL: https://issues.apache.org/jira/browse/HDFS-12049
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> A node refresh will recommission included nodes that are alive and in 
> decommissioning or decommissioned state.  The recommission will scan all 
> blocks on the node, find over replicated blocks, chose an excess, queue an 
> invalidate.
> The process is expensive and worsened by overhead of storage types (even when 
> not in use).  It can be especially devastating because the write lock is held 
> for the entire node refresh.  _Recommissioning 67 nodes with ~500k 
> blocks/node stalled rpc services for over 4 mins._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11310) Reduce the performance impact of the balancer (trunk port)

2018-02-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353286#comment-16353286
 ] 

Wangda Tan commented on HDFS-11310:
---

Thanks [~daryn] for reporting this issue. We plan to start merge vote of 3.1.0 
on Feb 18, please let me know if any plan to finish this by Feb 18 or we need 
to move it out.

> Reduce the performance impact of the balancer (trunk port)
> --
>
> Key: HDFS-11310
> URL: https://issues.apache.org/jira/browse/HDFS-11310
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Daryn Sharp
>Priority: Critical
>
> HDFS-7967 introduced a highly performant balancer getBlocks() query that 
> scales to large/dense clusters.  The simple design implementation depends on 
> the triplets data structure.  HDFS-9260 removed the triplets which 
> fundamentally changes the implementation.  Either that patch must be reverted 
> or the getBlocks() patch needs reimplementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353289#comment-16353289
 ] 

Wangda Tan commented on HDFS-12452:
---

We plan to start merge vote of 3.1.0 on Feb 18, please let me know if any plan 
to finish this by Feb 18 or we need to move it to 3.2.0.

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Priority: Critical
>  Labels: flaky-test
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11225) NameNode crashed because deleteSnapshot held FSNamesystem lock too long

2018-02-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353284#comment-16353284
 ] 

Wangda Tan commented on HDFS-11225:
---

[~manojg]/[~shashikant], any progress here? We plan to start merge vote of 
3.1.0 on Feb 18, please let me know if you plan to finish this by Feb 18 or we 
need to move it out.

> NameNode crashed because deleteSnapshot held FSNamesystem lock too long
> ---
>
> Key: HDFS-11225
> URL: https://issues.apache.org/jira/browse/HDFS-11225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
> Environment: CDH5.8.2, HA
>Reporter: Wei-Chiu Chuang
>Assignee: Manoj Govindassamy
>Priority: Critical
>  Labels: high-availability
> Attachments: Snaphot_Deletion_Design_Proposal.pdf
>
>
> The deleteSnapshot operation is synchronous. In certain situations this 
> operation may hold FSNamesystem lock for too long, bringing almost every 
> NameNode operation to a halt.
> We have observed one incidence where it took so long that ZKFC believes the 
> NameNode is down. All other IPC threads were waiting to acquire FSNamesystem 
> lock. This specific deleteSnapshot took ~70 seconds. ZKFC has connection 
> timeout of 45 seconds by default, and if all IPC threads wait for 
> FSNamesystem lock and can't accept new incoming connection, ZKFC times out, 
> advances epoch and NameNode will therefore lose its active NN role and then 
> fail.
> Relevant log:
> {noformat}
> Thread 154 (IPC Server handler 86 on 8020):
>   State: RUNNABLE
>   Blocked count: 2753455
>   Waited count: 89201773
>   Stack:
> 
> org.apache.hadoop.hdfs.server.namenode.INode$BlocksMapUpdateInfo.addDeleteBlock(INode.java:879)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.destroyAndCollectBlocks(INodeFile.java:508)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.destroyAndCollectBlocks(INodeReference.java:339)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.destroyAndCollectBlocks(INodeReference.java:606)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$ChildrenDiff.destroyDeletedList(DirectoryWithSnapshotFeature.java:119)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$ChildrenDiff.access$400(DirectoryWithSnapshotFeature.java:61)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.destroyDiffAndCollectBlocks(DirectoryWithSnapshotFeature.java:319)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.destroyDiffAndCollectBlocks(DirectoryWithSnapshotFeature.java:167)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:83)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:745)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:776)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:747)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:747)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:776)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:747)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:789)
> {noformat}
> After the ZKFC determined NameNode was down and advanced epoch, the NN 
> finished deleting snapshot, and sent the edit to journal nodes, but it was 
> rejected because epoch was updated. See the following stacktrace:
> {noformat}
> 10.0.16.21:8485: IPC's epoch 17 is less than the last promised epoch 18
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:429)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:457)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:352)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal

[jira] [Commented] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade

2018-02-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353282#comment-16353282
 ] 

Wangda Tan commented on HDFS-8893:
--

[~daryn] / [~shahrs87], any progress here? Since this is an issue existed for a 
long time, I will move this to 3.2.0 on Feb 18 if no objections.

> DNs with failed volumes stop serving during rolling upgrade
> ---
>
> Key: HDFS-8893
> URL: https://issues.apache.org/jira/browse/HDFS-8893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Daryn Sharp
>Priority: Critical
>
> When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker 
> to each of their volumes. If one of the volumes is bad, this will fail. When 
> this failure happens, the DN does not update the key it received from the NN.
> Unfortunately we had one failed volume on all the 3 datanodes which were 
> having replica.
> Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the 
> DNs with failed volumes will stop serving clients.
> Here is the stack trace on the datanode size:
> {noformat}
> 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN 
> datanode.DataNode: IOException in offerService
> java.io.IOException: Read-only file system
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:947)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11163) Mover should move the file blocks to default storage once policy is unset

2017-03-03 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-11163:
--
Attachment: temp-YARN-6278.HDFS-11163.patch

> Mover should move the file blocks to default storage once policy is unset
> -
>
> Key: HDFS-11163
> URL: https://issues.apache.org/jira/browse/HDFS-11163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-11163-001.patch, HDFS-11163-002.patch, 
> HDFS-11163-003.patch, HDFS-11163-004.patch, temp-YARN-6278.HDFS-11163.patch
>
>
> HDFS-9534 added new API in FileSystem to unset the storage policy. Once 
> policy is unset blocks should move back to the default storage policy.
> Currently mover is not moving file blocks which have zero storage ID
> {code}
>   // currently we ignore files with unspecified storage policy
>   if (policyId == HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) {
> return;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10692) Point JDiff base version for HDFS from 2.6.0 to 2.7.2

2016-08-18 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-10692:
--
Summary: Point JDiff base version for HDFS from 2.6.0 to 2.7.2  (was: Point 
JDiff base version for HDFS from 2.6.0 to 2.7.3)

> Point JDiff base version for HDFS from 2.6.0 to 2.7.2
> -
>
> Key: HDFS-10692
> URL: https://issues.apache.org/jira/browse/HDFS-10692
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Blocker
> Attachments: 3.0.0-alpha1-jdiff-hdfs.zip, HDFS-10692.1.patch, 
> HDFS-10692.2.patch
>
>
> Now JDiff is pointed to 2.6.0, we need to upgrade it to the latest stable 
> release (2.7.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10692) Point JDiff base version for HDFS from 2.6.0 to 2.7.3

2016-08-18 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-10692:
--
Attachment: HDFS-10692.2.patch

Attached ver.2 patch, downgraded 2.7.3 to 2.7.2 since 2.7.3 is not released yet.

> Point JDiff base version for HDFS from 2.6.0 to 2.7.3
> -
>
> Key: HDFS-10692
> URL: https://issues.apache.org/jira/browse/HDFS-10692
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Blocker
> Attachments: 3.0.0-alpha1-jdiff-hdfs.zip, HDFS-10692.1.patch, 
> HDFS-10692.2.patch
>
>
> Now JDiff is pointed to 2.6.0, we need to upgrade it to the latest stable 
> release (2.7.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10692) Point JDiff base version for HDFS from 2.6.0 to 2.7.2

2016-08-18 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-10692:
--
Description: Now JDiff is pointed to 2.6.0, we need to upgrade it to the 
latest stable release (2.7.2)  (was: Now JDiff is pointed to 2.6.0, we need to 
upgrade it to the latest stable release (2.7.3))

> Point JDiff base version for HDFS from 2.6.0 to 2.7.2
> -
>
> Key: HDFS-10692
> URL: https://issues.apache.org/jira/browse/HDFS-10692
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Blocker
> Attachments: 3.0.0-alpha1-jdiff-hdfs.zip, HDFS-10692.1.patch, 
> HDFS-10692.2.patch
>
>
> Now JDiff is pointed to 2.6.0, we need to upgrade it to the latest stable 
> release (2.7.2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10692) Point JDiff base version for HDFS from 2.6.0 to 2.7.3

2016-08-18 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427160#comment-15427160
 ] 

Wangda Tan commented on HDFS-10692:
---

This needs some common fixes of HADOOP-13428, added link.

> Point JDiff base version for HDFS from 2.6.0 to 2.7.3
> -
>
> Key: HDFS-10692
> URL: https://issues.apache.org/jira/browse/HDFS-10692
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Blocker
> Attachments: 3.0.0-alpha1-jdiff-hdfs.zip, HDFS-10692.1.patch
>
>
> Now JDiff is pointed to 2.6.0, we need to upgrade it to the latest stable 
> release (2.7.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10692) Point JDiff base version for HDFS from 2.6.0 to 2.7.3

2016-07-26 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-10692:
--
Attachment: 3.0.0-alpha1-jdiff-hdfs.zip

Attached generated jdiff file for review as well.

> Point JDiff base version for HDFS from 2.6.0 to 2.7.3
> -
>
> Key: HDFS-10692
> URL: https://issues.apache.org/jira/browse/HDFS-10692
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Blocker
> Attachments: 3.0.0-alpha1-jdiff-hdfs.zip, HDFS-10692.1.patch
>
>
> Now JDiff is pointed to 2.6.0, we need to upgrade it to the latest stable 
> release (2.7.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10692) Point JDiff base version for HDFS from 2.6.0 to 2.7.3

2016-07-26 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-10692:
--
Attachment: HDFS-10692.1.patch

Attached ver.1 patch, this is on top of HADOOP-13428

> Point JDiff base version for HDFS from 2.6.0 to 2.7.3
> -
>
> Key: HDFS-10692
> URL: https://issues.apache.org/jira/browse/HDFS-10692
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Blocker
> Attachments: HDFS-10692.1.patch
>
>
> Now JDiff is pointed to 2.6.0, we need to upgrade it to the latest stable 
> release (2.7.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10692) Point JDiff base version for HDFS from 2.6.0 to 2.7.3

2016-07-26 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HDFS-10692:
--
Priority: Blocker  (was: Major)

> Point JDiff base version for HDFS from 2.6.0 to 2.7.3
> -
>
> Key: HDFS-10692
> URL: https://issues.apache.org/jira/browse/HDFS-10692
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Blocker
>
> Now JDiff is pointed to 2.6.0, we need to upgrade it to the latest stable 
> release (2.7.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-10692) Point JDiff base version for HDFS from 2.6.0 to 2.7.3

2016-07-26 Thread Wangda Tan (JIRA)

Wangda Tan created HDFS-10692:
-

 Summary: Point JDiff base version for HDFS from 2.6.0 to 2.7.3
 Key: HDFS-10692
 URL: https://issues.apache.org/jira/browse/HDFS-10692
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wangda Tan


Now JDiff is pointed to 2.6.0, we need to upgrade it to the latest stable 
release (2.7.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10458) getFileEncryptionInfo should return quickly for non-encrypted cluster

2016-06-06 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317900#comment-15317900
 ] 

Wangda Tan commented on HDFS-10458:
---

Thanks, [~zhz]!

> getFileEncryptionInfo should return quickly for non-encrypted cluster
> -
>
> Key: HDFS-10458
> URL: https://issues.apache.org/jira/browse/HDFS-10458
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.6.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.8.0, 2.7.3, 3.0.0-alpha1
>
> Attachments: HDFS-10458-branch-2.00.patch, 
> HDFS-10458-branch-2.6.00.patch, HDFS-10458-branch-2.7.00.patch, 
> HDFS-10458.00.patch, HDFS-10458.03.patch, HDFS-10458.04.patch, 
> HDFS-10458.05.patch, HDFSA-10458.01.patch, HDFSA-10458.02.patch
>
>
> {{FSDirectory#getFileEncryptionInfo}} always acquires {{readLock}} and checks 
> if the path belongs to an EZ. For a busy system with potentially many listing 
> operations, this could cause locking contention.
> I think we should add a call {{EncryptionZoneManager#hasEncryptionZone()}} to 
> return whether the system has any EZ. If no EZ at all, 
> {{getFileEncryptionInfo}} should return null without {{readLock}}.
> If {{hasEncryptionZone}} is only used in the above scenario, maybe itself 
> doesn't need a {{readLock}} -- if the system doesn't have any EZ when 
> {{getFileEncryptionInfo}} is called on a path, it means the path cannot be 
> encrypted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10458) getFileEncryptionInfo should return quickly for non-encrypted cluster

2016-06-06 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317881#comment-15317881
 ] 

Wangda Tan commented on HDFS-10458:
---

[~zhz], perhaps this causes by different JDK version, etc. I didn't investigate 
further for that. This is Jenkins run with the patch: 
https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder/72/console. 

> getFileEncryptionInfo should return quickly for non-encrypted cluster
> -
>
> Key: HDFS-10458
> URL: https://issues.apache.org/jira/browse/HDFS-10458
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.6.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.8.0, 2.7.3, 3.0.0-alpha1
>
> Attachments: HDFS-10458-branch-2.6.00.patch, 
> HDFS-10458-branch-2.7.00.patch, HDFS-10458.00.patch, HDFS-10458.03.patch, 
> HDFS-10458.04.patch, HDFS-10458.05.patch, HDFSA-10458.01.patch, 
> HDFSA-10458.02.patch
>
>
> {{FSDirectory#getFileEncryptionInfo}} always acquires {{readLock}} and checks 
> if the path belongs to an EZ. For a busy system with potentially many listing 
> operations, this could cause locking contention.
> I think we should add a call {{EncryptionZoneManager#hasEncryptionZone()}} to 
> return whether the system has any EZ. If no EZ at all, 
> {{getFileEncryptionInfo}} should return null without {{readLock}}.
> If {{hasEncryptionZone}} is only used in the above scenario, maybe itself 
> doesn't need a {{readLock}} -- if the system doesn't have any EZ when 
> {{getFileEncryptionInfo}} is called on a path, it means the path cannot be 
> encrypted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10458) getFileEncryptionInfo should return quickly for non-encrypted cluster

2016-06-06 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317872#comment-15317872
 ] 

Wangda Tan commented on HDFS-10458:
---

branch-2 has the same issue, reverting patch from branch-2 as well.

> getFileEncryptionInfo should return quickly for non-encrypted cluster
> -
>
> Key: HDFS-10458
> URL: https://issues.apache.org/jira/browse/HDFS-10458
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.6.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.8.0, 2.7.3, 3.0.0-alpha1
>
> Attachments: HDFS-10458-branch-2.6.00.patch, 
> HDFS-10458-branch-2.7.00.patch, HDFS-10458.00.patch, HDFS-10458.03.patch, 
> HDFS-10458.04.patch, HDFS-10458.05.patch, HDFSA-10458.01.patch, 
> HDFSA-10458.02.patch
>
>
> {{FSDirectory#getFileEncryptionInfo}} always acquires {{readLock}} and checks 
> if the path belongs to an EZ. For a busy system with potentially many listing 
> operations, this could cause locking contention.
> I think we should add a call {{EncryptionZoneManager#hasEncryptionZone()}} to 
> return whether the system has any EZ. If no EZ at all, 
> {{getFileEncryptionInfo}} should return null without {{readLock}}.
> If {{hasEncryptionZone}} is only used in the above scenario, maybe itself 
> doesn't need a {{readLock}} -- if the system doesn't have any EZ when 
> {{getFileEncryptionInfo}} is called on a path, it means the path cannot be 
> encrypted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 110 matches

Mail list logo