[jira] [Commented] (HDFS-10276) Different results for exist call for file.ext/name

2016-04-17 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245214#comment-15245214
 ] 

Yuanbo Liu commented on HDFS-10276:
---

Hi, Kevin
Sorry for taking so long to reply you, I'm quite busy last week.
Seems that you used the username {{kevincox}} to operate files of root 
directory on the distributed file system. I think the distributed file system 
works as expected since you may not have the access to operate the files. It's 
better to compare the two different file systems under the same condition, for 
instance, the username {{kevincox}} does not have the access to 
read/write/execute files on the both systems and then operate the files. I've 
tested different scenarios and {{exists}} method did not throw any error.  So 
could you please provide the steps of  throwing 
{{org.apache.hadoop.security.AccessControlException: Permission denied: ..., 
access=EXECUTE, ...}} ?

> Different results for exist call for file.ext/name
> --
>
> Key: HDFS-10276
> URL: https://issues.apache.org/jira/browse/HDFS-10276
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kevin Cox
>Assignee: Yuanbo Liu
>
> Given you have a file {{/file}} an existence check for the path 
> {{/file/whatever}} will give different responses for different 
> implementations of FileSystem.
> LocalFileSystem will return false while DistributedFileSystem will throw 
> {{org.apache.hadoop.security.AccessControlException: Permission denied: ..., 
> access=EXECUTE, ...}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10276) Different results for exist call for file.ext/name

2016-04-17 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245215#comment-15245215
 ] 

Yuanbo Liu commented on HDFS-10276:
---

Hi, Kevin
Sorry for taking so long to reply you, I'm quite busy last week.
Seems that you used the username {{kevincox}} to operate files of root 
directory on the distributed file system. I think the distributed file system 
works as expected since you may not have the access to operate the files. It's 
better to compare the two different file systems under the same condition, for 
instance, the username {{kevincox}} does not have the access to 
read/write/execute files on the both systems and then operate the files. I've 
tested different scenarios and {{exists}} method did not throw any error.  So 
could you please provide the steps of  throwing 
{{org.apache.hadoop.security.AccessControlException: Permission denied: ..., 
access=EXECUTE, ...}} ?

> Different results for exist call for file.ext/name
> --
>
> Key: HDFS-10276
> URL: https://issues.apache.org/jira/browse/HDFS-10276
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kevin Cox
>Assignee: Yuanbo Liu
>
> Given you have a file {{/file}} an existence check for the path 
> {{/file/whatever}} will give different responses for different 
> implementations of FileSystem.
> LocalFileSystem will return false while DistributedFileSystem will throw 
> {{org.apache.hadoop.security.AccessControlException: Permission denied: ..., 
> access=EXECUTE, ...}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8449) Add tasks count metrics to datanode for ECWorker

2016-04-17 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245211#comment-15245211
 ] 

Li Bo commented on HDFS-8449:
-

hi, [~drankye], could you help me review the newly updated patch?
Thanks

> Add tasks count metrics to datanode for ECWorker
> 
>
> Key: HDFS-8449
> URL: https://issues.apache.org/jira/browse/HDFS-8449
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Li Bo
>Assignee: Li Bo
> Attachments: HDFS-8449-000.patch, HDFS-8449-001.patch, 
> HDFS-8449-002.patch, HDFS-8449-003.patch, HDFS-8449-004.patch
>
>
> This sub task try to record ec recovery tasks that a datanode has done, 
> including total tasks, failed tasks and sucessful tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8449) Add tasks count metrics to datanode for ECWorker

2016-04-17 Thread Li Bo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8449:

Attachment: HDFS-8449-004.patch

> Add tasks count metrics to datanode for ECWorker
> 
>
> Key: HDFS-8449
> URL: https://issues.apache.org/jira/browse/HDFS-8449
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Li Bo
>Assignee: Li Bo
> Attachments: HDFS-8449-000.patch, HDFS-8449-001.patch, 
> HDFS-8449-002.patch, HDFS-8449-003.patch, HDFS-8449-004.patch
>
>
> This sub task try to record ec recovery tasks that a datanode has done, 
> including total tasks, failed tasks and sucessful tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode

2016-04-17 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245203#comment-15245203
 ] 

Zhe Zhang commented on HDFS-7859:
-

Thanks Kia and Rakesh for reactivating the discussion.

I agree that at this stage there isn't a clear need for custom policies.

The main motivation for persisting EC policies in NN is probably downgrade. 
Assuming that we never remove any existing built-in policies from 
{{ErasureCodingPolicyManager}}, we won't have issues with upgrade. But the 
chance of adding an EC policy in a 3.x release is nontrivial.

So I don't think this is a 3.0 blocker. But it would be nice to have it for 3.0 
release.

> Erasure Coding: Persist erasure coding policies in NameNode
> ---
>
> Key: HDFS-7859
> URL: https://issues.apache.org/jira/browse/HDFS-7859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Xinwei Qin 
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7859-HDFS-7285.002.patch, 
> HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, 
> HDFS-7859.001.patch, HDFS-7859.002.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
> persist EC schemas in NameNode centrally and reliably, so that EC zones can 
> reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value

2016-04-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245164#comment-15245164
 ] 

Hadoop QA commented on HDFS-10302:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 100m 50s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 31s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 222m 53s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead |
|   | hadoop.hdfs.server.datanode.TestDataNodeLifeline |
|   | hadoop.hdfs.TestFileAppend |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.fs.TestSymlinkHdfsFileContext |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
| JDK v1.7.0_95 Failed jun

[jira] [Commented] (HDFS-10291) TestShortCircuitLocalRead failing

2016-04-17 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245154#comment-15245154
 ] 

Walter Su commented on HDFS-10291:
--

+1.

> TestShortCircuitLocalRead failing
> -
>
> Key: HDFS-10291
> URL: https://issues.apache.org/jira/browse/HDFS-10291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HDFS-10291-001.patch
>
>
> {{TestShortCircuitLocalRead}} failing as length of read is considered off end 
> of buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5453) Support fine grain locking in FSNamesystem

2016-04-17 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245144#comment-15245144
 ] 

Vinayakumar B commented on HDFS-5453:
-

Hi [~daryn], is there any update on this?
Looking forward to your results and patch. 
Happy to review and help.
Thanks.

> Support fine grain locking in FSNamesystem
> --
>
> Key: HDFS-5453
> URL: https://issues.apache.org/jira/browse/HDFS-5453
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: async_simulation.xlsx
>
>
> The namesystem currently uses a course grain lock to control access.  This 
> prevents concurrent writers in different branches of the tree, and prevents 
> readers from accessing branches that writers aren't using.
> Features that introduce latency to namesystem operations, such as cold 
> storage of inodes, will need fine grain locking to avoid degrading the entire 
> namesystem's throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8161) Both Namenodes are in standby State

2016-04-17 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245051#comment-15245051
 ] 

Brahma Reddy Battula commented on HDFS-8161:


Its happened in physical machines.

> Both Namenodes are in standby State
> ---
>
> Key: HDFS-8161
> URL: https://issues.apache.org/jira/browse/HDFS-8161
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Affects Versions: 2.6.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: ACTIVEBreadcumb and StandbyElector.txt
>
>
> Suspected Scenario:
> 
> Start cluster with three Nodes.
> Reboot Machine where ZKFC is not running..( Here Active Node ZKFC should open 
> session with this ZK )
> Now  ZKFC ( Active NN's ) session expire and try re-establish connection with 
> another ZK...Bythe time  ZKFC ( StndBy NN's ) will try to fence old active 
> and create the active Breadcrumb and Makes SNN to active state..
> But immediately it fence to standby state.. ( Here is the doubt)
> Hence both will be in standby state..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value

2016-04-17 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10302:
-
Issue Type: Improvement  (was: Bug)

> BlockPlacementPolicyDefault should use default replication considerload value
> -
>
> Key: HDFS-10302
> URL: https://issues.apache.org/jira/browse/HDFS-10302
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Attachments: HDFS-10302.001.patch
>
>
> Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value 
> {{true}} as the replication considerload default value rather than using the 
> existed string constant value 
> {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}.
> {code}
>   @Override
>   public void initialize(Configuration conf,  FSClusterStats stats,
>  NetworkTopology clusterMap, 
>  Host2NodesMap host2datanodeMap) {
> this.considerLoad = conf.getBoolean(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true);
> this.considerLoadFactor = conf.getDouble(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR,
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT);
> this.stats = stats;
> this.clusterMap = clusterMap;
> this.host2datanodeMap = host2datanodeMap;
> this.heartbeatInterval = conf.getLong(
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000;
> this.tolerateHeartbeatMultiplier = conf.getInt(
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY,
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT);
> this.staleInterval = conf.getLong(
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, 
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT);
> this.preferLocalNode = conf.getBoolean(
> DFSConfigKeys.
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY,
> DFSConfigKeys.
> 
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT);
>   }
> {code}
> And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be 
> used in any place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete

2016-04-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245049#comment-15245049
 ] 

Hudson commented on HDFS-9412:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9625 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9625/])
HDFS-9412. getBlocks occupies FSLock and takes too long to complete. 
(waltersu4549: rev 67523ffcf491f4f2db5335899c00a174d0caaa9b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetBlocks.java


> getBlocks occupies FSLock and takes too long to complete
> 
>
> Key: HDFS-9412
> URL: https://issues.apache.org/jira/browse/HDFS-9412
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: He Tianyi
>Assignee: He Tianyi
> Fix For: 2.8.0
>
> Attachments: HDFS-9412..patch, HDFS-9412.0001.patch, 
> HDFS-9412.0002.patch
>
>
> {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a 
> long time to complete (probably several seconds, if number of blocks are too 
> much). 
> During this period, other threads attempting to acquire write lock will wait. 
> In an extreme case, RPC handlers are occupied by one reader thread calling 
> {{getBlocks}} and all other threads waiting for write lock, rpc server acts 
> like hung. Unfortunately, this tends to happen in heavy loaded cluster, since 
> read operations come and go fast (they do not need to wait), leaving write 
> operations waiting.
> Looks like we can optimize this thing like DN block report did in past, by 
> splitting the operation into smaller sub operations, and let other threads do 
> their work between each sub operation. The whole result is returned at once, 
> though (one thing different from DN block report). 
> I am not sure whether this will work. Any better idea?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value

2016-04-17 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10302:
-
Attachment: HDFS-10302.001.patch

> BlockPlacementPolicyDefault should use default replication considerload value
> -
>
> Key: HDFS-10302
> URL: https://issues.apache.org/jira/browse/HDFS-10302
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Attachments: HDFS-10302.001.patch
>
>
> Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value 
> {{true}} as the replication considerload default value rather than using the 
> existed string constant value 
> {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}.
> {code}
>   @Override
>   public void initialize(Configuration conf,  FSClusterStats stats,
>  NetworkTopology clusterMap, 
>  Host2NodesMap host2datanodeMap) {
> this.considerLoad = conf.getBoolean(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true);
> this.considerLoadFactor = conf.getDouble(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR,
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT);
> this.stats = stats;
> this.clusterMap = clusterMap;
> this.host2datanodeMap = host2datanodeMap;
> this.heartbeatInterval = conf.getLong(
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000;
> this.tolerateHeartbeatMultiplier = conf.getInt(
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY,
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT);
> this.staleInterval = conf.getLong(
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, 
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT);
> this.preferLocalNode = conf.getBoolean(
> DFSConfigKeys.
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY,
> DFSConfigKeys.
> 
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT);
>   }
> {code}
> And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be 
> used in any place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value

2016-04-17 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10302:
-
Status: Patch Available  (was: Open)

Attach a simple patch for this.

> BlockPlacementPolicyDefault should use default replication considerload value
> -
>
> Key: HDFS-10302
> URL: https://issues.apache.org/jira/browse/HDFS-10302
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
>
> Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value 
> {{true}} as the replication considerload default value rather than using the 
> existed string constant value 
> {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}.
> {code}
>   @Override
>   public void initialize(Configuration conf,  FSClusterStats stats,
>  NetworkTopology clusterMap, 
>  Host2NodesMap host2datanodeMap) {
> this.considerLoad = conf.getBoolean(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true);
> this.considerLoadFactor = conf.getDouble(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR,
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT);
> this.stats = stats;
> this.clusterMap = clusterMap;
> this.host2datanodeMap = host2datanodeMap;
> this.heartbeatInterval = conf.getLong(
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000;
> this.tolerateHeartbeatMultiplier = conf.getInt(
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY,
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT);
> this.staleInterval = conf.getLong(
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, 
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT);
> this.preferLocalNode = conf.getBoolean(
> DFSConfigKeys.
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY,
> DFSConfigKeys.
> 
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT);
>   }
> {code}
> And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be 
> used in any place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value

2016-04-17 Thread Lin Yiqun (JIRA)
Lin Yiqun created HDFS-10302:


 Summary: BlockPlacementPolicyDefault should use default 
replication considerload value
 Key: HDFS-10302
 URL: https://issues.apache.org/jira/browse/HDFS-10302
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun
Priority: Trivial


Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value 
{{true}} as the replication considerload default value rather than using the 
existed string constant value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}.
{code}
  @Override
  public void initialize(Configuration conf,  FSClusterStats stats,
 NetworkTopology clusterMap, 
 Host2NodesMap host2datanodeMap) {
this.considerLoad = conf.getBoolean(
DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true);
this.considerLoadFactor = conf.getDouble(
DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR,
DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT);
this.stats = stats;
this.clusterMap = clusterMap;
this.host2datanodeMap = host2datanodeMap;
this.heartbeatInterval = conf.getLong(
DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000;
this.tolerateHeartbeatMultiplier = conf.getInt(
DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY,
DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT);
this.staleInterval = conf.getLong(
DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, 
DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT);
this.preferLocalNode = conf.getBoolean(
DFSConfigKeys.
DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY,
DFSConfigKeys.

DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT);
  }
{code}

And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be 
used in any place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete

2016-04-17 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9412:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8. Thanks [~He Tianyi] for contribution!

> getBlocks occupies FSLock and takes too long to complete
> 
>
> Key: HDFS-9412
> URL: https://issues.apache.org/jira/browse/HDFS-9412
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: He Tianyi
>Assignee: He Tianyi
> Fix For: 2.8.0
>
> Attachments: HDFS-9412..patch, HDFS-9412.0001.patch, 
> HDFS-9412.0002.patch
>
>
> {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a 
> long time to complete (probably several seconds, if number of blocks are too 
> much). 
> During this period, other threads attempting to acquire write lock will wait. 
> In an extreme case, RPC handlers are occupied by one reader thread calling 
> {{getBlocks}} and all other threads waiting for write lock, rpc server acts 
> like hung. Unfortunately, this tends to happen in heavy loaded cluster, since 
> read operations come and go fast (they do not need to wait), leaving write 
> operations waiting.
> Looks like we can optimize this thing like DN block report did in past, by 
> splitting the operation into smaller sub operations, and let other threads do 
> their work between each sub operation. The whole result is returned at once, 
> though (one thing different from DN block report). 
> I am not sure whether this will work. Any better idea?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-17 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-10301:
---
Attachment: zombieStorageLogs.rtf

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244890#comment-15244890
 ] 

Konstantin Shvachko commented on HDFS-10301:


My DN has the following six storages:
{code}
DS-019298c0-aab9-45b4-8b62-95d6809380ff:NORMAL:kkk.sss.22.105
DS-0ea95238-d9ba-4f62-ae18-fdb9333465ce:NORMAL:kkk.sss.22.105
DS-191fc04b-90be-42c9-b6fb-fdd1517bf4c7:NORMAL:kkk.sss.22.105
DS-4a2e91c7-cdf0-408b-83a6-286c3534d673:NORMAL:kkk.sss.22.105
DS-5b2941f7-2b52-45a8-b135-dcbe488cc65b:NORMAL:kkk.sss.22.105
DS-6849f605-fd83-462d-97c3-cb6949383f7e:NORMAL:kkk.sss.22.105
{code}
Here are the logs for its block reports. All throw the same exception, but I 
pasted it only once.
{code}
2016-04-12 22:31:58,931 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Unsuccessfully sent block report 0x283d25423fb64d,  containing 6 storage 
report(s), of which we sent 0. The reports had 81565 total blocks and used 0 
RPC(s). This took 19 msec to generate and 60078 msecs for RPC and NN 
processing. Got back no commands.
2016-04-12 22:31:58,931 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in offerService
java.net.SocketTimeoutException: Call From 
dn-hcl1264.my.cluster.com/kkk.sss.22.105 to namenode-ha1.my.cluster.com:9000 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/kkk.sss.22.105:10101 
remote=namenode-ha1.my.cluster.com/10.150.1.56:9000]; For more details see:  
http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750)
at org.apache.hadoop.ipc.Client.call(Client.java:1473)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy12.blockReport(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:178)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:494)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:732)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:872)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/kkk.sss.22.105:10101 
remote=namenode-ha1.my.cluster.com/10.150.1.56:9000]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)

2016-04-12 22:32:59,179 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Unsuccessfully sent block report 0x283d334a100bde,  containing 6 storage 
report(s), of which we sent 0. The reports had 81565 total blocks and used 0 
RPC(s). This took 17 msec to generate and 60066 msecs for RPC and NN 
processing. Got back no commands.
2016-04-12 22:33:59,311 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Unsuccessfully sent block report 0x283d414ae386b2,  containing 6 storage 
report(s), of which we sent 0. The reports had 81565 total blocks and used 0 
RPC(s). This took 16 msec to generate and 60055 msecs for RPC and NN 
processing. Got back no commands.
2016-04-12 22:34:59,409 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Unsuccessfully sent block report 0x283d4f4a605732,  containing 6 storage 
report(s), of which we sent 0. The reports had 81565 total b

[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-17 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244876#comment-15244876
 ] 

Konstantin Shvachko commented on HDFS-10301:


More details.
# My DataNode has 6 storages. It sends a block report and times out, then it 
sends the same block report five more times with different blockReportIds.
# The NameNode starts executing all six reports around the same time, and 
interleaves them, that is it processes the first storage of BR2 before it 
process the last storage of BR1. (Color coded logs are coming)
# While processing storages from BR2 NameNode changes the lastBlockReportId 
field to the id of BR2. This messes with processing storages from BR1, which 
have not been processed yet. Namely these storages are considered zombie, and 
all replicas are removed from those storages along with the storage itself.
# The storage is then reconstructed by the NameNode when it receives a 
heartbeat from the DataNode, but this storage is marked as "stale", but the 
replicas will not be reconstructed until the next block report, which in my 
case is a few hours later.
# I noticed missing blocks because several DataNodes exhibited the same 
behavior and all replicas of the same block were lost.
# The replicas eventually reappeared (several hours later), because DataNodes 
do not physically remove the replicas and report them in the next block report.

The behavior was introduced by HDFS-7960 as a part of hot-swap feature. I did 
not do hot-swap, and did not failover the NameNode.

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-17 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created HDFS-10301:
--

 Summary: Blocks removed by thousands due to falsely detected 
zombie storages
 Key: HDFS-10301
 URL: https://issues.apache.org/jira/browse/HDFS-10301
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.1
Reporter: Konstantin Shvachko
Priority: Critical


When NameNode is busy a DataNode can timeout sending a block report. Then it 
sends the block report again. Then NameNode while process these two reports at 
the same time can interleave processing storages from different reports. This 
screws up the blockReportId field, which makes NameNode think that some 
storages are zombie. Replicas from zombie storages are immediately removed, 
causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10298) Document the usage of distcp -diff option

2016-04-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244681#comment-15244681
 ] 

Hadoop QA commented on HDFS-10298:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 8m 34s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12799136/HDFS-10298.1.patch |
| JIRA Issue | HDFS-10298 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux a8798a36aa50 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / fdc46bf |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15181/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Document the usage of distcp -diff option
> -
>
> Key: HDFS-10298
> URL: https://issues.apache.org/jira/browse/HDFS-10298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation
>Affects Versions: 2.8.0
>Reporter: Akira AJISAKA
>Assignee: Takashi Ohnishi
> Attachments: HDFS-10298.1.patch
>
>
> Distcp -diff options is currently documented as "Use snapshot diff report to 
> identify the difference between source and target.", but the usage is not 
> documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10298) Document the usage of distcp -diff option

2016-04-17 Thread Takashi Ohnishi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takashi Ohnishi updated HDFS-10298:
---
Attachment: HDFS-10298.1.patch

> Document the usage of distcp -diff option
> -
>
> Key: HDFS-10298
> URL: https://issues.apache.org/jira/browse/HDFS-10298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation
>Affects Versions: 2.8.0
>Reporter: Akira AJISAKA
>Assignee: Takashi Ohnishi
> Attachments: HDFS-10298.1.patch
>
>
> Distcp -diff options is currently documented as "Use snapshot diff report to 
> identify the difference between source and target.", but the usage is not 
> documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10298) Document the usage of distcp -diff option

2016-04-17 Thread Takashi Ohnishi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244677#comment-15244677
 ] 

Takashi Ohnishi commented on HDFS-10298:


Hi!

I have created a patch for this.
In the attached patch, I have added the notation about arguments for -diff 
option and the preconditions for this feature which are mentioned in the top 
comments in DistCpSync.java.

> Document the usage of distcp -diff option
> -
>
> Key: HDFS-10298
> URL: https://issues.apache.org/jira/browse/HDFS-10298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation
>Affects Versions: 2.8.0
>Reporter: Akira AJISAKA
> Attachments: HDFS-10298.1.patch
>
>
> Distcp -diff options is currently documented as "Use snapshot diff report to 
> identify the difference between source and target.", but the usage is not 
> documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10298) Document the usage of distcp -diff option

2016-04-17 Thread Takashi Ohnishi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takashi Ohnishi updated HDFS-10298:
---
Status: Patch Available  (was: Open)

> Document the usage of distcp -diff option
> -
>
> Key: HDFS-10298
> URL: https://issues.apache.org/jira/browse/HDFS-10298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation
>Affects Versions: 2.8.0
>Reporter: Akira AJISAKA
>Assignee: Takashi Ohnishi
> Attachments: HDFS-10298.1.patch
>
>
> Distcp -diff options is currently documented as "Use snapshot diff report to 
> identify the difference between source and target.", but the usage is not 
> documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-10298) Document the usage of distcp -diff option

2016-04-17 Thread Takashi Ohnishi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takashi Ohnishi reassigned HDFS-10298:
--

Assignee: Takashi Ohnishi

> Document the usage of distcp -diff option
> -
>
> Key: HDFS-10298
> URL: https://issues.apache.org/jira/browse/HDFS-10298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation
>Affects Versions: 2.8.0
>Reporter: Akira AJISAKA
>Assignee: Takashi Ohnishi
> Attachments: HDFS-10298.1.patch
>
>
> Distcp -diff options is currently documented as "Use snapshot diff report to 
> identify the difference between source and target.", but the usage is not 
> documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete

2016-04-17 Thread He Tianyi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244578#comment-15244578
 ] 

He Tianyi commented on HDFS-9412:
-

Hi, [~walter.k.su]. Would you commit this patch when available?

> getBlocks occupies FSLock and takes too long to complete
> 
>
> Key: HDFS-9412
> URL: https://issues.apache.org/jira/browse/HDFS-9412
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: He Tianyi
>Assignee: He Tianyi
> Attachments: HDFS-9412..patch, HDFS-9412.0001.patch, 
> HDFS-9412.0002.patch
>
>
> {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a 
> long time to complete (probably several seconds, if number of blocks are too 
> much). 
> During this period, other threads attempting to acquire write lock will wait. 
> In an extreme case, RPC handlers are occupied by one reader thread calling 
> {{getBlocks}} and all other threads waiting for write lock, rpc server acts 
> like hung. Unfortunately, this tends to happen in heavy loaded cluster, since 
> read operations come and go fast (they do not need to wait), leaving write 
> operations waiting.
> Looks like we can optimize this thing like DN block report did in past, by 
> splitting the operation into smaller sub operations, and let other threads do 
> their work between each sub operation. The whole result is returned at once, 
> though (one thing different from DN block report). 
> I am not sure whether this will work. Any better idea?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)