date:20121105


[ 
https://issues.apache.org/jira/browse/HDFS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490739#comment-13490739
 ] 

Hadoop QA commented on HDFS-4138:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552112/hdfs-4138.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3443//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3443//console

This message is automatically generated.

 BackupNode startup fails due to uninitialized edit log
 --

 Key: HDFS-4138
 URL: https://issues.apache.org/jira/browse/HDFS-4138
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, name-node
Affects Versions: 2.0.3-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: hdfs-4138.patch


 It was notices by TestBackupNode.testCheckpointNode failure. When a backup 
 node is getting started, it tries to enter active state and start common 
 services. But when it fails to start services and exits, which is caught by 
 the exit util.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4142) hadoop fs -mv command creates nested directory instead of overwriting when a same named directory as source already exists

2012-11-05 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490785#comment-13490785
 ] 

Colin Patrick McCabe commented on HDFS-4142:


The UNIX {{mv}} command seems to prompt in this scenario, unless {{\-f}} is 
specified.  If we change {{hadoop fs -mv}} to do the same thing, we should 
prompt as well.

 hadoop fs -mv command creates nested directory instead of overwriting when a 
 same named directory as source already exists
 --

 Key: HDFS-4142
 URL: https://issues.apache.org/jira/browse/HDFS-4142
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: hadoop 0.23.4
Reporter: Arup Malakar
 Attachments: RenameTest.java


 Using the hadoop cli when I try to move a directory to another directory, if 
 the target directory contains a directory with the same name as the source 
 directory, it would create nested directories instead of overwriting it. This 
 seems counter intuitive as this is not the behavior with unix mv command. 
 Here is an example to explain the bug:
 {code}
 ~ $ hadoop fs -lsr /tmp/root
 lsr: DEPRECATED: Please use 'ls -R' instead.
 drwx--   - malakar hdfs  0 2012-11-01 23:30 /tmp/root/parent
 drwx--   - malakar hdfs  0 2012-11-01 23:30 /tmp/root/parent/child
 -rw---   3 malakar hdfs   9950 2012-11-01 23:30 
 /tmp/root/parent/child/passwd
 drwx--   - malakar hdfs  0 2012-11-01 23:31 /tmp/root/parent2
 drwx--   - malakar hdfs  0 2012-11-01 23:31 
 /tmp/root/parent2/child
 ~ $ hadoop fs -mv /tmp/root/parent/child /tmp/root/parent2
 ~ $ hadoop fs -lsr /tmp/root
 lsr: DEPRECATED: Please use 'ls -R' instead.
 drwx--   - malakar hdfs  0 2012-11-01 23:32 /tmp/root/parent
 drwx--   - malakar hdfs  0 2012-11-01 23:31 /tmp/root/parent2
 drwx--   - malakar hdfs  0 2012-11-01 23:32 
 /tmp/root/parent2/child
 drwx--   - malakar hdfs  0 2012-11-01 23:30 
 /tmp/root/parent2/child/child
 -rw---   3 malakar hdfs   9950 2012-11-01 23:30 
 /tmp/root/parent2/child/child/passwd
 {code}
 The same operation seems to fail when using the 
 [FileSystem|http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#rename(org.apache.hadoop.fs.Path,
  org.apache.hadoop.fs.Path)] rename api  though.
 Using the java api:
 {code}
 ~ $ hadoop jar test.jar RenameTest
 Before:
 drwx--   - malakar hdfs  0 2012-11-02 00:23 /tmp/renametest/parent
 drwx--   - malakar hdfs  0 2012-11-02 00:23 
 /tmp/renametest/parent/child
 -rw---   3 malakar hdfs  0 2012-11-02 00:23 
 /tmp/renametest/parent/child/file
 drwx--   - malakar hdfs  0 2012-11-02 00:23 
 /tmp/renametest/targetparent
 About to move: /tmp/renametest/parent/child to: /tmp/renametest/targetparent
 After moving: /tmp/renametest/parent/child to /tmp/renametest/targetparent
 drwx--   - malakar hdfs  0 2012-11-02 00:23 /tmp/renametest/parent
 drwx--   - malakar hdfs  0 2012-11-02 00:23 
 /tmp/renametest/targetparent
 drwx--   - malakar hdfs  0 2012-11-02 00:23 
 /tmp/renametest/targetparent/child
 -rw---   3 malakar hdfs  0 2012-11-02 00:23 
 /tmp/renametest/targetparent/child/file
 Before:
 drwx--   - malakar hdfs  0 2012-11-02 00:23 /tmp/renametest/parent
 drwx--   - malakar hdfs  0 2012-11-02 00:23 
 /tmp/renametest/parent/child
 -rw---   3 malakar hdfs  0 2012-11-02 00:23 
 /tmp/renametest/parent/child/file
 drwx--   - malakar hdfs  0 2012-11-02 00:23 
 /tmp/renametest/targetparent
 drwx--   - malakar hdfs  0 2012-11-02 00:23 
 /tmp/renametest/targetparent/child
 About to move: /tmp/renametest/parent/child to: /tmp/renametest/targetparent
 Could not rename directory: /tmp/renametest/parent/child to 
 /tmp/renametest/targetparent
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4114) Deprecate the BackupNode and CheckpointNode in 2.0


[ 
https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490789#comment-13490789
 ] 

Suresh Srinivas commented on HDFS-4114:
---

In federation, we wanted to use CheckpointNode for sharing a single 
checkpointer for all the namenodes in a cluster, triggered based on crontab. I 
have not had time to explore this further.

Looking at the discussion so far, I understand removing/deprecating if no one 
plans to use or work on BackupNode. Given that Konstantin has indicated 
interest in pursuing it, I would leave it as it is right now. Konstantin, it 
would be great if you can share your plans and timelines. Meanwhile, for next 
few months, if any issues crop up, I volunteer to spend my time fixing them.

 Deprecate the BackupNode and CheckpointNode in 2.0
 --

 Key: HDFS-4114
 URL: https://issues.apache.org/jira/browse/HDFS-4114
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Eli Collins

 Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the 
 BackupNode and CheckpointNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory


[ 
https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490798#comment-13490798
 ] 

Suresh Srinivas commented on HDFS-4151:
---

Nicholas, if this is for trunk can you please mark the Affects Versions(s) as 
3.0.0?

 Passing INodesInPath instead of INode[] in FSDirectory
 --

 Key: HDFS-4151
 URL: https://issues.apache.org/jira/browse/HDFS-4151
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Attachments: h4151_20121104.patch


 Currently, many methods in FSDirectory pass INode[] as a parameter.  It is 
 better to pass INodesInPath so that we can add more path information later 
 on.  This is especially useful in Snapshot implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory


[ 
https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490813#comment-13490813
 ] 

Suresh Srinivas commented on HDFS-4151:
---

Minort comment - FSNamesystem#allocateBlock() - javadoc @param needs to be 
updated from inodes to inodesInPath. Otherwise the patch looks good.


 Passing INodesInPath instead of INode[] in FSDirectory
 --

 Key: HDFS-4151
 URL: https://issues.apache.org/jira/browse/HDFS-4151
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Attachments: h4151_20121104.patch


 Currently, many methods in FSDirectory pass INode[] as a parameter.  It is 
 better to pass INodesInPath so that we can add more path information later 
 on.  This is especially useful in Snapshot implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1331) dfs -test should work like /bin/test

2012-11-05 Thread Andy Isaacson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-1331:


Attachment: hdfs1331-3.txt

 dfs -test should work like /bin/test
 

 Key: HDFS-1331
 URL: https://issues.apache.org/jira/browse/HDFS-1331
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 0.20.2, 3.0.0, 2.0.2-alpha
Reporter: Allen Wittenauer
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs1331-2.txt, hdfs1331-3.txt, hdfs1331.txt, 
 hdfs1331-with-hadoop8994.txt


 hadoop dfs -test doesn't act like its shell equivalent, making it difficult 
 to actually use if you are used to the real test command:
 hadoop:
 $hadoop dfs -test -d /nonexist; echo $?
 test: File does not exist: /nonexist
 255
 shell:
 $ test -d /nonexist; echo $?
 1
 a) Why is it spitting out a message? Even so, why is it saying file instead 
 of directory when I used -d?
 b) Why is the return code 255? I realize this is documented as '0' if true.  
 But docs basically say the value is undefined if it isn't.
 c) where is -f?
 d) Why is empty -z instead of -s ?  Was it a misunderstanding of the man page?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1331) dfs -test should work like /bin/test

2012-11-05 Thread Andy Isaacson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490831#comment-13490831
 ] 

Andy Isaacson commented on HDFS-1331:
-

bq. follow the pattern for the usage of other commands

There's a lot of inconsistency, but I figure the {{-ls}} usage message is a 
good example to follow. I cleaned up the rest of {{Test.java}}s usage message 
too while in there.

 dfs -test should work like /bin/test
 

 Key: HDFS-1331
 URL: https://issues.apache.org/jira/browse/HDFS-1331
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 0.20.2, 3.0.0, 2.0.2-alpha
Reporter: Allen Wittenauer
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs1331-2.txt, hdfs1331-3.txt, hdfs1331.txt, 
 hdfs1331-with-hadoop8994.txt


 hadoop dfs -test doesn't act like its shell equivalent, making it difficult 
 to actually use if you are used to the real test command:
 hadoop:
 $hadoop dfs -test -d /nonexist; echo $?
 test: File does not exist: /nonexist
 255
 shell:
 $ test -d /nonexist; echo $?
 1
 a) Why is it spitting out a message? Even so, why is it saying file instead 
 of directory when I used -d?
 b) Why is the return code 255? I realize this is documented as '0' if true.  
 But docs basically say the value is undefined if it isn't.
 c) where is -f?
 d) Why is empty -z instead of -s ?  Was it a misunderstanding of the man page?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490841#comment-13490841
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3979:
--

Thanks for the update, Lars.
+1 patch looks good.  I will commit it if there is no more comments.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, 
 hdfs-3979-v3.txt, hdfs-3979-v4.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4138) BackupNode startup fails due to uninitialized edit log

2012-11-05 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490865#comment-13490865
 ] 

Konstantin Shvachko commented on HDFS-4138:
---

Kihwal, this is not deterministic, right? I managed to reproduce it once so I 
have the logs. Looking. Is there a Jenkins build with the failure?

 BackupNode startup fails due to uninitialized edit log
 --

 Key: HDFS-4138
 URL: https://issues.apache.org/jira/browse/HDFS-4138
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, name-node
Affects Versions: 2.0.3-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: hdfs-4138.patch


 It was notices by TestBackupNode.testCheckpointNode failure. When a backup 
 node is getting started, it tries to enter active state and start common 
 services. But when it fails to start services and exits, which is caught by 
 the exit util.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory


[ 
https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490874#comment-13490874
 ] 

Suresh Srinivas commented on HDFS-4151:
---

+1 with the above comment addressed.

 Passing INodesInPath instead of INode[] in FSDirectory
 --

 Key: HDFS-4151
 URL: https://issues.apache.org/jira/browse/HDFS-4151
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Attachments: h4151_20121104.patch


 Currently, many methods in FSDirectory pass INode[] as a parameter.  It is 
 better to pass INodesInPath so that we can add more path information later 
 on.  This is especially useful in Snapshot implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-05 Thread Luke Lu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490887#comment-13490887
]

Luke Lu commented on HDFS-3979:
---

bq. I think it will decrease the performance for non-sync write.

It'll be nice if we can show/quantify the decrease in performance for non-sync
writes. It may not be wise to introduce complexity and make hflush less robust
if this is a non-issue.

bq. The existing tests: TestFiPipelines and TestFiHFlush do not cover the other
scenarios you worry about?

It seems that TestFiHFlush doesn't cover the failure scenarios. All the test
cases are positive assertions (pipeline can recover in spite of disk error
exceptions), which seems not very useful given the ack is done before the disk
error exceptions are triggered. A new TestFiHSync seems necessary especially
for the new patch, where the ack code path diverged from hflush. Basically, I
want to make sure that hsync would be guaranteed to get an error if the
pipeline cannot be recovered (e.g., due to required datanodes ran out of disk
space etc).

Anyway, I'm fine with filing another jira for these hflush/hsync improvement.

Fix hsync and hflush semantics.
---

Key: HDFS-3979
URL: https://issues.apache.org/jira/browse/HDFS-3979
Project: Hadoop HDFS
Issue Type: Bug
Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt,
hdfs-3979-v3.txt, hdfs-3979-v4.txt

See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver
is not on a synchronous path from the DFSClient, hence it is possible that a
DN loses data that it has already acknowledged as persisted to a client.
Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory


 [ 
https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4151:
-

Attachment: h4151_20121105.patch

h4151_20121105.patch: fixes the javadoc and some bugs in the patch causing the 
test failure.

 Passing INodesInPath instead of INode[] in FSDirectory
 --

 Key: HDFS-4151
 URL: https://issues.apache.org/jira/browse/HDFS-4151
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Attachments: h4151_20121104.patch, h4151_20121105.patch


 Currently, many methods in FSDirectory pass INode[] as a parameter.  It is 
 better to pass INodesInPath so that we can add more path information later 
 on.  This is especially useful in Snapshot implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1331) dfs -test should work like /bin/test


[ 
https://issues.apache.org/jira/browse/HDFS-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490913#comment-13490913
 ] 

Hadoop QA commented on HDFS-1331:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552139/hdfs1331-3.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.cli.TestCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3444//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3444//console

This message is automatically generated.

 dfs -test should work like /bin/test
 

 Key: HDFS-1331
 URL: https://issues.apache.org/jira/browse/HDFS-1331
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 0.20.2, 3.0.0, 2.0.2-alpha
Reporter: Allen Wittenauer
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs1331-2.txt, hdfs1331-3.txt, hdfs1331.txt, 
 hdfs1331-with-hadoop8994.txt


 hadoop dfs -test doesn't act like its shell equivalent, making it difficult 
 to actually use if you are used to the real test command:
 hadoop:
 $hadoop dfs -test -d /nonexist; echo $?
 test: File does not exist: /nonexist
 255
 shell:
 $ test -d /nonexist; echo $?
 1
 a) Why is it spitting out a message? Even so, why is it saying file instead 
 of directory when I used -d?
 b) Why is the return code 255? I realize this is documented as '0' if true.  
 But docs basically say the value is undefined if it isn't.
 c) where is -f?
 d) Why is empty -z instead of -s ?  Was it a misunderstanding of the man page?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-11-05 Thread Eli Collins (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490929#comment-13490929
 ] 

Eli Collins commented on HDFS-2802:
---

Hey Jagane,
With regard to your first comment, the goal is definitely not to have dueling 
proposals, the doc ATM posted only attempts to flesh out things in the first 
proposal that were not covered or left as future work. Namely, (a) snapshot 
creation time and memory usage should be O(1), and (b) consistency semantics 
sufficient to implement HBase snapshots. We think these are fundamental 
requirements that should be addressed in the initial design vs fixed up later; 
which is why we started fleshing out a design that satisfies them, not to push 
on a separate proposal. If Suresh and Nicholas now agree, which I think they 
do, correct me if I'm wrong, that we should come up with a design that handles 
these requirements first, rather than go with the original approach and modify 
it later then I think we should merge these two design documents, which is what 
ATM was attempting to do.

 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: HDFS-2802.20121101.patch, 
 HDFS-2802-meeting-minutes-121101.txt, HDFSSnapshotsDesign.pdf, snap.patch, 
 snapshot-design.pdf, snapshot-design.tex, snapshot-one-pager.pdf, 
 Snapshots20121018.pdf, Snapshots20121030.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

[
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490938#comment-13490938
]

Tsz Wo (Nicholas), SZE commented on HDFS-2802:
--

... (a) snapshot creation time and memory usage should be O(1), ...

Eli, the memory usage in ATM's proposal is not O(1) since it adds tags for
every INode, therefore, it is O(N), where N is the number of INodes. It is
much worse that the additional memory usage is required even if the snapshot
feature is not used. You are right that ATM's design attempt to solve such
problems but I believe the attempt fails.

Support for RW/RO snapshots in HDFS
---

Key: HDFS-2802
URL: https://issues.apache.org/jira/browse/HDFS-2802
Project: Hadoop HDFS
Issue Type: New Feature
Components: data-node, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
Attachments: HDFS-2802.20121101.patch,
HDFS-2802-meeting-minutes-121101.txt, HDFSSnapshotsDesign.pdf, snap.patch,
snapshot-design.pdf, snapshot-design.tex, snapshot-one-pager.pdf,
Snapshots20121018.pdf, Snapshots20121030.pdf

Snapshots are point in time images of parts of the filesystem or the entire
filesystem. Snapshots can be a read-only or a read-write point in time copy
of the filesystem. There are several use cases for snapshots in HDFS. I will
post a detailed write-up soon with with more information.

[jira] [Updated] (HDFS-4150) Update inode in blocksMap when deleting original/snapshot file


 [ 
https://issues.apache.org/jira/browse/HDFS-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4150:


Attachment: HDFS-4150.000.patch

Initial patch without tests. Will add/run testcases for it.

 Update inode in blocksMap when deleting original/snapshot file
 --

 Key: HDFS-4150
 URL: https://issues.apache.org/jira/browse/HDFS-4150
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-4150.000.patch


 When deleting a file/directory, instead of directly removing all the 
 corresponding blocks, we should update inodes in blocksMap if there are 
 snapshots for them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

[
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490951#comment-13490951
]

Suresh Srinivas commented on HDFS-2802:
---

bq. It would be useful to allow users to be able to set directories in their
own home dir as 'snapshottable'.
Jagane. This is really a good idea. This makes the administration simpler. Will
add it to the next iteration of design doc.

bq. We think these are fundamental requirements that should be addressed in the
initial design vs fixed up later;
Eli, please read the comments. While this is how the discussions started, I
disagree in this case we could not have incrementally improved it. Anyway
discussing this is a moot point now, because we reorganized our work and have
addressed the issues brought up and the efficient implementation is now almost
done.

bq. If Suresh and Nicholas now agree, which I think they do, correct me if I'm
wrong, that we should come up with a design that handles these requirements
first, rather than go with the original approach and modify it later then I
think we should merge these two design documents, which is what ATM was
attempting to do.
I am not sure if you are on top of all the discussions and design udpates. The
design has already addressed the concerns here. Please read why this design is
superior as well
[here|https://issues.apache.org/jira/browse/HDFS-2802?focusedCommentId=13487447page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13487447]

Support for RW/RO snapshots in HDFS
---

[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-11-05 Thread Aaron T. Myers (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490960#comment-13490960
]

Aaron T. Myers commented on HDFS-2802:
--

bq. Eli, the memory usage in ATM's proposal is not O(1) since it adds tags for
every INode, therefore, it is O(N), where N is the number of INodes...

I believe the distinction here is how much memory is used _at snapshot creation
time_. Both of the current proposals are constant in terms of time/space at
snapshot creation time, but you're right that the most recent proposal I posted
would add some overhead per INode.

In any case, I agree that the solution described in the most recent document
posted by Suresh will be more space efficient than the latest proposal I
posted. No disagreement there. This is what I was referring to when I said
This [the design posted by Suresh] has the advantage of saving NN memory space
for files/directories which have never had snapshots created of them.

Support for RW/RO snapshots in HDFS
---

[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory


[ 
https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490972#comment-13490972
 ] 

Hadoop QA commented on HDFS-4151:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552156/h4151_20121105.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3445//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3445//console

This message is automatically generated.

 Passing INodesInPath instead of INode[] in FSDirectory
 --

 Key: HDFS-4151
 URL: https://issues.apache.org/jira/browse/HDFS-4151
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Attachments: h4151_20121104.patch, h4151_20121105.patch


 Currently, many methods in FSDirectory pass INode[] as a parameter.  It is 
 better to pass INodesInPath so that we can add more path information later 
 on.  This is especially useful in Snapshot implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4114) Deprecate the BackupNode and CheckpointNode in 2.0

2012-11-05 Thread Eli Collins (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491000#comment-13491000
 ] 

Eli Collins commented on HDFS-4114:
---

Konstantin, are you actually interested in pursuing this? The BackupNode hasn't 
been touched in years.

 Deprecate the BackupNode and CheckpointNode in 2.0
 --

 Key: HDFS-4114
 URL: https://issues.apache.org/jira/browse/HDFS-4114
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Eli Collins

 Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the 
 BackupNode and CheckpointNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory


 [ 
https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4151:
-

   Resolution: Fixed
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I have committed this.

 Passing INodesInPath instead of INode[] in FSDirectory
 --

 Key: HDFS-4151
 URL: https://issues.apache.org/jira/browse/HDFS-4151
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Fix For: 3.0.0

 Attachments: h4151_20121104.patch, h4151_20121105.patch


 Currently, many methods in FSDirectory pass INode[] as a parameter.  It is 
 better to pass INodesInPath so that we can add more path information later 
 on.  This is especially useful in Snapshot implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4150) Update inode in blocksMap when deleting original/snapshot file


 [ 
https://issues.apache.org/jira/browse/HDFS-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4150:


Attachment: HDFS-4150.001.patch

Patch with testcase.

 Update inode in blocksMap when deleting original/snapshot file
 --

 Key: HDFS-4150
 URL: https://issues.apache.org/jira/browse/HDFS-4150
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-4150.000.patch, HDFS-4150.001.patch


 When deleting a file/directory, instead of directly removing all the 
 corresponding blocks, we should update inodes in blocksMap if there are 
 snapshots for them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory

2012-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491044#comment-13491044
 ] 

Hudson commented on HDFS-4151:
--

Integrated in Hadoop-trunk-Commit #2957 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2957/])
HDFS-4151. Change the methods in FSDirectory to pass INodesInPath instead 
of INode[] as a parameter. (Revision 1406006)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1406006
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java


 Passing INodesInPath instead of INode[] in FSDirectory
 --

 Key: HDFS-4151
 URL: https://issues.apache.org/jira/browse/HDFS-4151
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
 Fix For: 3.0.0

 Attachments: h4151_20121104.patch, h4151_20121105.patch


 Currently, many methods in FSDirectory pass INode[] as a parameter.  It is 
 better to pass INodesInPath so that we can add more path information later 
 on.  This is especially useful in Snapshot implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++

2012-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491052#comment-13491052
 ] 

Hudson commented on HDFS-4046:
--

Integrated in Hadoop-trunk-Commit #2958 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2958/])
HDFS-4046. Adding the missed file in revision 1406011 (Revision 1406012)
HDFS-4046. Rename ChecksumTypeProto enum NULL since it is illegal in C/C++. 
Contributed by Binglin Chang. (Revision 1406011)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1406012
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestHdfsProtoUtil.java

suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1406011
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsProtoUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtoUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto


 ChecksumTypeProto use NULL as enum value which is illegal in C/C++
 --

 Key: HDFS-4046
 URL: https://issues.apache.org/jira/browse/HDFS-4046
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Fix For: 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, 
 HDFS-4046-ChecksumType-NULL.patch, HDFS-4096-ChecksumTypeProto-NULL.patch


 I tried to write a native hdfs client using protobuf based protocol, when I 
 generate c++ code using hdfs.proto, the generated file can not compile, 
 because NULL is an already defined macro.
 I am thinking two solutions:
 1. refactor all DataChecksum.Type.NULL references to NONE, which should be 
 fine for all languages, but this may breaking compatibility.
 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use 
 enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto 
 and DataChecksum.Type, and make sure enum integer values are match(currently 
 already match).
 I can make a patch for solution 2.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++

[
https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Suresh Srinivas updated HDFS-4046:
--

Resolution: Fixed
Fix Version/s: 2.0.3-alpha
3.0.0
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)

I committed the patch to trunk and branch-2. Thank you Binglin.

ChecksumTypeProto use NULL as enum value which is illegal in C/C++
--

Key: HDFS-4046
URL: https://issues.apache.org/jira/browse/HDFS-4046
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
Fix For: 3.0.0, 2.0.3-alpha

Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch,
HDFS-4046-ChecksumType-NULL.patch, HDFS-4096-ChecksumTypeProto-NULL.patch

I tried to write a native hdfs client using protobuf based protocol, when I
generate c++ code using hdfs.proto, the generated file can not compile,
because NULL is an already defined macro.
I am thinking two solutions:
1. refactor all DataChecksum.Type.NULL references to NONE, which should be
fine for all languages, but this may breaking compatibility.
2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use
enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto
and DataChecksum.Type, and make sure enum integer values are match(currently
already match).
I can make a patch for solution 2.

[jira] [Updated] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++


 [ 
https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-4046:
--

  Component/s: name-node
   data-node
Affects Version/s: 2.0.0-alpha

 ChecksumTypeProto use NULL as enum value which is illegal in C/C++
 --

 Key: HDFS-4046
 URL: https://issues.apache.org/jira/browse/HDFS-4046
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, name-node
Affects Versions: 2.0.0-alpha
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Fix For: 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, 
 HDFS-4046-ChecksumType-NULL.patch, HDFS-4096-ChecksumTypeProto-NULL.patch


 I tried to write a native hdfs client using protobuf based protocol, when I 
 generate c++ code using hdfs.proto, the generated file can not compile, 
 because NULL is an already defined macro.
 I am thinking two solutions:
 1. refactor all DataChecksum.Type.NULL references to NONE, which should be 
 fine for all languages, but this may breaking compatibility.
 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use 
 enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto 
 and DataChecksum.Type, and make sure enum integer values are match(currently 
 already match).
 I can make a patch for solution 2.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4114) Deprecate the BackupNode and CheckpointNode in 2.0

2012-11-05 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491058#comment-13491058
 ] 

Konstantin Boudnik commented on HDFS-4114:
--

Also, from the pure precedent stand-point post-factum deprecation (e.g. after a 
release had happen) is a bad idea. I am in full agreement with Konstantin here 
that the past practice should stand and a deprecated feature has to be kept 
around for at least one major release.

 Deprecate the BackupNode and CheckpointNode in 2.0
 --

 Key: HDFS-4114
 URL: https://issues.apache.org/jira/browse/HDFS-4114
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Eli Collins

 Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the 
 BackupNode and CheckpointNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-11-05 Thread Aaron T. Myers (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491060#comment-13491060
]

Aaron T. Myers commented on HDFS-2802:
--

bq. It would be useful to allow users to be able to set directories in their
own home dir as 'snapshottable'.

Not suggesting this is a bad idea, but if we do decide to go with this then I
think we should relax the proposed restriction that renames of
files/directories must only be within the subtree of a snapshottable root, as
described in part 1a of Operations Supported on Snapshots section of the most
recent design document posted by Suresh. I think users might be surprised to
find that because they've marked two directories in their home dir as
snapshottable that they cannot then rename files/dirs between those
directories. (I realize the document already says this can be revisited - I'm
just pointing out that this might be a good reason to revisit it.)

Support for RW/RO snapshots in HDFS
---

[jira] [Commented] (HDFS-4114) Deprecate the BackupNode and CheckpointNode in 2.0

2012-11-05 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491061#comment-13491061
 ] 

Todd Lipcon commented on HDFS-4114:
---

bq. In federation, we wanted to use CheckpointNode for sharing a single 
checkpointer for all the namenodes in a cluster, triggered based on crontab

My question is why that doesn't apply to the SecondaryNameNode just as well? 
The 2NN and the CheckpointNode offer identical functionality, except that one 
has been battle-tested whereas the other hasn't been run in production AFAIK.

 Deprecate the BackupNode and CheckpointNode in 2.0
 --

 Key: HDFS-4114
 URL: https://issues.apache.org/jira/browse/HDFS-4114
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Eli Collins

 Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the 
 BackupNode and CheckpointNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4150) Update inode in blocksMap when deleting original/snapshot file


 [ 
https://issues.apache.org/jira/browse/HDFS-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4150:


Attachment: HDFS-4150.002.patch

Updated based on changes in HDFS-4151.

 Update inode in blocksMap when deleting original/snapshot file
 --

 Key: HDFS-4150
 URL: https://issues.apache.org/jira/browse/HDFS-4150
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-4150.000.patch, HDFS-4150.001.patch, 
 HDFS-4150.002.patch


 When deleting a file/directory, instead of directly removing all the 
 corresponding blocks, we should update inodes in blocksMap if there are 
 snapshots for them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4114) Deprecate the BackupNode and CheckpointNode in 2.0


[ 
https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491078#comment-13491078
 ] 

Suresh Srinivas commented on HDFS-4114:
---

bq. My question is why that doesn't apply to the SecondaryNameNode just as well?
At least at the time when I was considering it, Secondary Namenode was a long 
running daemon and Checkpointer was shaping up to be a utility.

 Deprecate the BackupNode and CheckpointNode in 2.0
 --

 Key: HDFS-4114
 URL: https://issues.apache.org/jira/browse/HDFS-4114
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Eli Collins

 Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the 
 BackupNode and CheckpointNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS

2012-11-05 Thread Jagane Sundar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491083#comment-13491083
 ] 

Jagane Sundar commented on HDFS-2802:
-

Adding the capability for a user to set his/her own directory as 
'snapshottable' is a good enhancement for the future. It should not hold up 
progress right now.

 Support for RW/RO snapshots in HDFS
 ---

 Key: HDFS-2802
 URL: https://issues.apache.org/jira/browse/HDFS-2802
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Hari Mankude
Assignee: Hari Mankude
 Attachments: HDFS-2802.20121101.patch, 
 HDFS-2802-meeting-minutes-121101.txt, HDFSSnapshotsDesign.pdf, snap.patch, 
 snapshot-design.pdf, snapshot-design.tex, snapshot-one-pager.pdf, 
 Snapshots20121018.pdf, Snapshots20121030.pdf


 Snapshots are point in time images of parts of the filesystem or the entire 
 filesystem. Snapshots can be a read-only or a read-write point in time copy 
 of the filesystem. There are several use cases for snapshots in HDFS. I will 
 post a detailed write-up soon with with more information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++

2012-11-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491091#comment-13491091
 ] 

Hudson commented on HDFS-4046:
--

Integrated in Hadoop-trunk-Commit #2959 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2959/])
Add HDFS-4046 to Release 2.0.3 section (Revision 1406019)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1406019
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 ChecksumTypeProto use NULL as enum value which is illegal in C/C++
 --

 Key: HDFS-4046
 URL: https://issues.apache.org/jira/browse/HDFS-4046
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, name-node
Affects Versions: 2.0.0-alpha
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Fix For: 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, 
 HDFS-4046-ChecksumType-NULL.patch, HDFS-4096-ChecksumTypeProto-NULL.patch


 I tried to write a native hdfs client using protobuf based protocol, when I 
 generate c++ code using hdfs.proto, the generated file can not compile, 
 because NULL is an already defined macro.
 I am thinking two solutions:
 1. refactor all DataChecksum.Type.NULL references to NONE, which should be 
 fine for all languages, but this may breaking compatibility.
 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use 
 enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto 
 and DataChecksum.Type, and make sure enum integer values are match(currently 
 already match).
 I can make a patch for solution 2.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4147) Deletion of snapshottable dir with snapshots should fail


 [ 
https://issues.apache.org/jira/browse/HDFS-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4147:


Attachment: HDFS-snapshot-check-rename.patch

The same check should also be applied to rename when the destination directory 
is snapshottable and has snapshots (since the destination directory needs to be 
deleted).

 Deletion of snapshottable dir with snapshots should fail
 

 Key: HDFS-4147
 URL: https://issues.apache.org/jira/browse/HDFS-4147
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: Snapshot (HDFS-2802)

 Attachments: HDFS-4147.001.patch, HDFS-4147.002.patch, 
 HDFS-snapshot-check-rename.patch


 Deletion of snapshottable dir with snapshots should fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1331) dfs -test should work like /bin/test

2012-11-05 Thread Andy Isaacson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-1331:


Attachment: hdfs1331-4.txt

 dfs -test should work like /bin/test
 

 Key: HDFS-1331
 URL: https://issues.apache.org/jira/browse/HDFS-1331
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 0.20.2, 3.0.0, 2.0.2-alpha
Reporter: Allen Wittenauer
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs1331-2.txt, hdfs1331-3.txt, hdfs1331-4.txt, 
 hdfs1331.txt, hdfs1331-with-hadoop8994.txt


 hadoop dfs -test doesn't act like its shell equivalent, making it difficult 
 to actually use if you are used to the real test command:
 hadoop:
 $hadoop dfs -test -d /nonexist; echo $?
 test: File does not exist: /nonexist
 255
 shell:
 $ test -d /nonexist; echo $?
 1
 a) Why is it spitting out a message? Even so, why is it saying file instead 
 of directory when I used -d?
 b) Why is the return code 255? I realize this is documented as '0' if true.  
 But docs basically say the value is undefined if it isn't.
 c) where is -f?
 d) Why is empty -z instead of -s ?  Was it a misunderstanding of the man page?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4123) HDFS logs should print stack trace and add a comment where it is intentionally not printed

2012-11-05 Thread Eli Collins (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-4123:
--

Labels: newbie  (was: )

 HDFS logs should print stack trace and add a comment where it is 
 intentionally not printed
 --

 Key: HDFS-4123
 URL: https://issues.apache.org/jira/browse/HDFS-4123
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Suresh Srinivas
Priority: Minor
  Labels: newbie

 Review the code and change the logs to print stack trace by changing the code 
 from:
 {noformat}
   LOG.info(message + exception)
 {noformat}
 to:
 {noformat}
   LOG.info(message, exception)
 {noformat}
 Where printing exception stack trace is avoided, add a comment to indicate 
 that intent, to avoid someone changing it to print full stack trace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4130) The reading for editlog at NN starting using bkjm is not efficient

2012-11-05 Thread Han Xiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491147#comment-13491147
 ] 

Han Xiao commented on HDFS-4130:


Hi Uma,

There are thousands of them. The default configuration will not purge them 
until there are more than 100, EXTRA_EDITS, which is configured by 
dfs.namenode.num.extra.edits.retained. If there is no operation, each 
edit-log will contain two txids which are the start and end of segment. It 
means that with the default, not until more than 50,000 edit-log files/ledgers, 
the redundant files/ledgers would be cleaned. 

We can set it smaller. And anyway, the way of reading ledger from bk in bkjm 
now is inefficient and the change works better.

 The reading for editlog at NN starting using bkjm  is not efficient
 ---

 Key: HDFS-4130
 URL: https://issues.apache.org/jira/browse/HDFS-4130
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, performance
Affects Versions: 2.0.2-alpha
Reporter: Han Xiao
 Attachments: HDFS-4130.patch


 Now, the method of BookKeeperJournalManager.selectInputStreams is written 
 like:
 while (true) {
   EditLogInputStream elis;
   try {
 elis = getInputStream(fromTxId, inProgressOk);
   } catch (IOException e) {
 LOG.error(e);
 return;
   }
   if (elis == null) {
 return;
   }
   streams.add(elis);
   if (elis.getLastTxId() == HdfsConstants.INVALID_TXID) {
 return;
   }
   fromTxId = elis.getLastTxId() + 1;
 }
  
 EditLogInputstream is got from getInputStream(), which will read the ledgers 
 from zookeeper in each calling.
 This will be a larger cost of times when the the number ledgers becomes large.
 The reading of ledgers from zk is not necessary for every calling of 
 getInputStream().
 The log of time wasting here is as follows:
 2012-10-30 16:44:52,995 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
 Caching file names occuring more than 10 times
 2012-10-30 16:49:24,643 INFO 
 hidden.bkjournal.org.apache.bookkeeper.proto.PerChannelBookieClient: 
 Successfully connected to bookie: /167.52.1.121:318
 The stack of the process when blocking between the two lines of log is like:
 main prio=10 tid=0x4011f000 nid=0x39ba in Object.wait() 
 \[0x7fca020fe000\]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:485)
 at 
 hidden.bkjournal.org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1253)
 \- locked 0x0006fb8495a8 (a 
 hidden.bkjournal.org.apache.zookeeper.ClientCnxn$Packet)
 at 
 hidden.bkjournal.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1129)
 at 
 org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper.getData(RetryableZookeeper.java:501)
 at 
 hidden.bkjournal.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160)
 at 
 org.apache.hadoop.contrib.bkjournal.EditLogLedgerMetadata.read(EditLogLedgerMetadata.java:113)
 at 
 org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.getLedgerList(BookKeeperJournalManager.java:725)
 at 
 org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.getInputStream(BookKeeperJournalManager.java:442)
 at 
 org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.selectInputStreams(BookKeeperJournalManager.java:480)
 
 betweent different time, the diff of stack is:
 diff stack stack2
 1c1
  2012-10-30 16:44:53
 ---
  2012-10-30 16:46:17
 106c106
- locked 0x0006fb8495a8 (a 
 hidden.bkjournal.org.apache.zookeeper.ClientCnxn$Packet)
 ---
- locked 0x0006fae58468 (a 
  hidden.bkjournal.org.apache.zookeeper.ClientCnxn$Packet)
 In our environment, the waiting time could even reach to tens of minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4150) Update inode in blocksMap when deleting original/snapshot file


[ 
https://issues.apache.org/jira/browse/HDFS-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491156#comment-13491156
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-4150:
--

- We should change the parameter of collectSubtreeBlocksAndClear(..) to a new 
class instead of MapBlock, BlockDeletionInfo so the block replacement info 
can be iterated before the block deletion info.  Let me create a JIRA to do it 
in trunk first.

- The INode replacement update may be delayed.  Then, the old inode may be used 
in replication.  We need to handle the replication computation.

- The new BlockDeletionInfo object below should be created outside the loop 
since it is the same for all the blocks.
{code}
//INodeFileWithLink.collectBlocksBeyondMaxAndClear(..)
+  // Replace the INode for all the remaining blocks in blocksMap
+  if (m != null) {
+for (int i = 0; i  n; i++) {
+  BlockDeletionInfo info = new BlockDeletionInfo(this, next);
+  m.put(oldBlocks[i], info);
+}
+  }
{code}

- BlockDeletionInfo is not a good name since the blocks are not going to be 
deleted, how about renaming it to INodeReplacementInfo?

- There are some code duplicated in FSNamesystem.  Please put them it a method.


 Update inode in blocksMap when deleting original/snapshot file
 --

 Key: HDFS-4150
 URL: https://issues.apache.org/jira/browse/HDFS-4150
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, name-node
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-4150.000.patch, HDFS-4150.001.patch, 
 HDFS-4150.002.patch


 When deleting a file/directory, instead of directly removing all the 
 corresponding blocks, we should update inodes in blocksMap if there are 
 snapshots for them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4150) Update inode in blocksMap when deleting original/snapshot file

2012-11-05 Thread Aaron T. Myers (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491158#comment-13491158
]

Aaron T. Myers commented on HDFS-4150:
--

Hi Jing, in general the patch looks pretty good to me. I just have one little
concern. There's two places where this patch contains code like this:

{code}
if (originalBlockInfo != null
toDelete == originalBlockInfo.getBlockCollection()) {
{code}

I'm concerned by the use of == here, instead of .equals or the like. Are we
in fact guaranteed that the same actual object reference will be used in both
places? (I think this is probably fine as-is, I just want to make sure.) Also,
you might want to add a comment above this code saying why replacing the
BlockCollection in the blocks map is appropriate in this case, i.e. in the case
of a snapshot existing which still references this block.

Update inode in blocksMap when deleting original/snapshot file
--

Key: HDFS-4150
URL: https://issues.apache.org/jira/browse/HDFS-4150
Project: Hadoop HDFS
Issue Type: Sub-task
Components: data-node, name-node
Reporter: Jing Zhao
Assignee: Jing Zhao
Attachments: HDFS-4150.000.patch, HDFS-4150.001.patch,
HDFS-4150.002.patch

When deleting a file/directory, instead of directly removing all the
corresponding blocks, we should update inodes in blocksMap if there are
snapshots for them.

[jira] [Commented] (HDFS-4123) HDFS logs should print stack trace and add a comment where it is intentionally not printed


[ 
https://issues.apache.org/jira/browse/HDFS-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491162#comment-13491162
 ] 

Suresh Srinivas commented on HDFS-4123:
---

This may not be a newbie. It should be done by someone with the background to 
decide whether stack trace is intentionally not printed.

 HDFS logs should print stack trace and add a comment where it is 
 intentionally not printed
 --

 Key: HDFS-4123
 URL: https://issues.apache.org/jira/browse/HDFS-4123
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Suresh Srinivas
Priority: Minor

 Review the code and change the logs to print stack trace by changing the code 
 from:
 {noformat}
   LOG.info(message + exception)
 {noformat}
 to:
 {noformat}
   LOG.info(message, exception)
 {noformat}
 Where printing exception stack trace is avoided, add a comment to indicate 
 that intent, to avoid someone changing it to print full stack trace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4123) HDFS logs should print stack trace and add a comment where it is intentionally not printed


 [ 
https://issues.apache.org/jira/browse/HDFS-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-4123:
--

Labels:   (was: newbie)

 HDFS logs should print stack trace and add a comment where it is 
 intentionally not printed
 --

 Key: HDFS-4123
 URL: https://issues.apache.org/jira/browse/HDFS-4123
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Suresh Srinivas
Priority: Minor

 Review the code and change the logs to print stack trace by changing the code 
 from:
 {noformat}
   LOG.info(message + exception)
 {noformat}
 to:
 {noformat}
   LOG.info(message, exception)
 {noformat}
 Where printing exception stack trace is avoided, add a comment to indicate 
 that intent, to avoid someone changing it to print full stack trace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1331) dfs -test should work like /bin/test


[ 
https://issues.apache.org/jira/browse/HDFS-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491188#comment-13491188
 ] 

Hadoop QA commented on HDFS-1331:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552199/hdfs1331-4.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3446//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3446//console

This message is automatically generated.

 dfs -test should work like /bin/test
 

 Key: HDFS-1331
 URL: https://issues.apache.org/jira/browse/HDFS-1331
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 0.20.2, 3.0.0, 2.0.2-alpha
Reporter: Allen Wittenauer
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs1331-2.txt, hdfs1331-3.txt, hdfs1331-4.txt, 
 hdfs1331.txt, hdfs1331-with-hadoop8994.txt


 hadoop dfs -test doesn't act like its shell equivalent, making it difficult 
 to actually use if you are used to the real test command:
 hadoop:
 $hadoop dfs -test -d /nonexist; echo $?
 test: File does not exist: /nonexist
 255
 shell:
 $ test -d /nonexist; echo $?
 1
 a) Why is it spitting out a message? Even so, why is it saying file instead 
 of directory when I used -d?
 b) Why is the return code 255? I realize this is documented as '0' if true.  
 But docs basically say the value is undefined if it isn't.
 c) where is -f?
 d) Why is empty -z instead of -s ?  Was it a misunderstanding of the man page?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2012-11-05 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-347:
--

Attachment: HDFS-347.020.patch

This is an updated patch based on a discussion we had in HADOOP-6311.
Basically, the current design is to pass file descriptors over a new class
named {{DomainSocket}}, which represents UNIX domain sockets. This is
accomplished by adding a new message to the {{DataTransferProtocol}},
{{RequestShortCircuitFd}}.

The {{DataXceiverServer}} can manage these UNIX domain sockets just as easily
as it manages existing the IPv4 sockets, because they implement the same
interfaces.

One thing I refactored in this patch is {{BlockReaderFactory}}. It formerly
contained only static methods; this patch changes it to be a real class with
instance methods and instance data. I felt that the {{BlockReaderFactory}}
methods were getting too unwieldy because we were passing a tremendous amount
of parameters, many of which could be considered properties of the factory in a
sense. Using instance data also allows the factory to keep a blacklist of
which {{DataNodes}} do not support file descriptor passing. It uses this
information to avoid making unnecesary requests.

This patch also introduces the concept of a format version number for blocks.
The idea here is that if we later change the block format on-disk, we want to
be able to tell clients that they can't short-circuit access these blocks
unless they can understand the corresponding version number. (One change we've
talked a lot about doing in the past is merging block data and metadata files.)
This makes it possible to have a cluster where you have some block files in
one format and some in another-- a necessity for doing a real-world transition.
The clients are passed the version number, so they can act intelligently-- or
simply refuse to read the newer formats if they don't know how.

Because this patch depends on the {{DomainSocket}} code, it currently
incorporates that code. HADOOP-6311 is the best place to comment about
{{DomainSocket}}, since that is what that JIRA is about.

DFS read performance suboptimal when client co-located on nodes with data
-

Key: HDFS-347
URL: https://issues.apache.org/jira/browse/HDFS-347
Project: Hadoop HDFS
Issue Type: Improvement
Components: data-node, hdfs client, performance
Reporter: George Porter
Assignee: Colin Patrick McCabe
Attachments: all.tsv, BlockReaderLocal1.txt, HADOOP-4801.1.patch,
HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-016_cleaned.patch,
HDFS-347.016.patch, HDFS-347.017.clean.patch, HDFS-347.017.patch,
HDFS-347.018.clean.patch, HDFS-347.018.patch2, HDFS-347.019.patch,
HDFS-347.020.patch, HDFS-347-branch-20-append.txt, hdfs-347.png,
hdfs-347.txt, local-reads-doc

One of the major strategies Hadoop uses to get scalable data processing is to
move the code to the data. However, putting the DFS client on the same
physical node as the data blocks it acts on doesn't improve read performance
as much as expected.
After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem
is due to the HDFS streaming protocol causing many more read I/O operations
(iops) than necessary. Consider the case of a DFSClient fetching a 64 MB
disk block from the DataNode process (running in a separate JVM) running on
the same machine. The DataNode will satisfy the single disk block request by
sending data back to the HDFS client in 64-KB chunks. In BlockSender.java,
this is done in the sendChunk() method, relying on Java's transferTo()
method. Depending on the host O/S and JVM implementation, transferTo() is
implemented as either a sendfilev() syscall or a pair of mmap() and write().
In either case, each chunk is read from the disk by issuing a separate I/O
operation for each chunk. The result is that the single request for a 64-MB
block ends up hitting the disk as over a thousand smaller requests for 64-KB
each.
Since the DFSClient runs in a different JVM and process than the DataNode,
shuttling data from the disk to the DFSClient also results in context
switches each time network packets get sent (in this case, the 64-kb chunk
turns into a large number of 1500 byte packet send operations). Thus we see
a large number of context switches for each block send operation.
I'd like to get some feedback on the best way to address this, but I think
providing a mechanism for a DFSClient to directly open data blocks that
happen to be on the same machine. It could do this by examining the set of
LocatedBlocks returned by the NameNode, marking those that should be resident
on the local host. Since the DataNode

[jira] [Created] (HDFS-4152) Add a new class for the parameter in INode.collectSubtreeBlocksAndClear(..)

Tsz Wo (Nicholas), SZE created HDFS-4152:


 Summary: Add a new class for the parameter in 
INode.collectSubtreeBlocksAndClear(..)
 Key: HDFS-4152
 URL: https://issues.apache.org/jira/browse/HDFS-4152
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Jing Zhao
Priority: Minor


INode.collectSubtreeBlocksAndClear(..) currently uses a list to collect blocks 
for deletion.  It cannot be extended to support other operation like updating 
the block-map.  We propose to add a new class to encapsulate the abstraction. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

[
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491228#comment-13491228
]

Hadoop QA commented on HDFS-347:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12552211/HDFS-347.020.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 8 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 6
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 3 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.net.unix.TestDomainSocket

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3447//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3447//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3447//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3447//console

This message is automatically generated.

DFS read performance suboptimal when client co-located on nodes with data
-

[jira] [Updated] (HDFS-4152) Add a new class for the parameter in INode.collectSubtreeBlocksAndClear(..)


 [ 
https://issues.apache.org/jira/browse/HDFS-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4152:


Attachment: HDFS-4152.001.patch

Patch uploaded.

 Add a new class for the parameter in INode.collectSubtreeBlocksAndClear(..)
 ---

 Key: HDFS-4152
 URL: https://issues.apache.org/jira/browse/HDFS-4152
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 3.0.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4152.001.patch


 INode.collectSubtreeBlocksAndClear(..) currently uses a list to collect 
 blocks for deletion.  It cannot be extended to support other operation like 
 updating the block-map.  We propose to add a new class to encapsulate the 
 abstraction. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4152) Add a new class for the parameter in INode.collectSubtreeBlocksAndClear(..)


 [ 
https://issues.apache.org/jira/browse/HDFS-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4152:


Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 Add a new class for the parameter in INode.collectSubtreeBlocksAndClear(..)
 ---

 Key: HDFS-4152
 URL: https://issues.apache.org/jira/browse/HDFS-4152
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 3.0.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-4152.001.patch


 INode.collectSubtreeBlocksAndClear(..) currently uses a list to collect 
 blocks for deletion.  It cannot be extended to support other operation like 
 updating the block-map.  We propose to add a new class to encapsulate the 
 abstraction. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4150) Update inode in blocksMap when deleting original/snapshot file