[jira] [Commented] (HDFS-2505) Add a test to verify getFileChecksum works with ViewFS
[ https://issues.apache.org/jira/browse/HDFS-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146835#comment-13146835 ] Ravi Prakash commented on HDFS-2505: Thanks Jitendra! Unit tests were added as part of HADOOP-7770 for the changes made in ViewFS.java and ViewFileSystem.java. There weren't any changes in HDFS code. Could you please specify what method / interface you'd like tested? Would this test not make sense, because it might break without any changes in HDFS code (if there are changes in the Hadoop getFileChecksum code)? If so, we can close this JIRA as invalid. Or do we WANT that check? So that if Hadoop Common changes things, HDFS tests break and we are notified to fix the issue in HDFS also? Add a test to verify getFileChecksum works with ViewFS -- Key: HDFS-2505 URL: https://issues.apache.org/jira/browse/HDFS-2505 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Labels: test Fix For: 0.24.0 Attachments: HDFS-2505.patch Please refer to HADOOP-7770. getFileChecksum was failing on files such as /tmp/someFile, but working fine for /someDir/someFile. The fix is in HADOOP-7770 but we should have a test in HDFS which checks this functionality (this test will fail until HADOOP-7770 is checked in) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2316) webhdfs: a complete FileSystem implementation for accessing HDFS over HTTP
[ https://issues.apache.org/jira/browse/HDFS-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146898#comment-13146898 ] Milind Bhandarkar commented on HDFS-2316: - Before I get tired of the case-sensitivity arguments, let me ask you who you are designing the system for ? I suppose that is for folks like me, who have used the URL scheme for more than 18 years now. So, here is my take: anything after that host:port/ is case sensitive. (Because after host:port/, I know that it refers to a file system, or a resource that ultimately refers to a file system.) So, please stop arguing, and design it for curmudgeons like me. Even without the reading glasses, I can recognize the difference between capital and small letters. Thank you ! webhdfs: a complete FileSystem implementation for accessing HDFS over HTTP -- Key: HDFS-2316 URL: https://issues.apache.org/jira/browse/HDFS-2316 Project: Hadoop HDFS Issue Type: New Feature Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: WebHdfsAPI20111020.pdf, WebHdfsAPI2003.pdf We current have hftp for accessing HDFS over HTTP. However, hftp is a read-only FileSystem and does not provide write accesses. In HDFS-2284, we propose to have webhdfs for providing a complete FileSystem implementation for accessing HDFS over HTTP. The is the umbrella JIRA for the tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2540) Change WebHdfsFileSystem to two-step create/append
[ https://issues.apache.org/jira/browse/HDFS-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147032#comment-13147032 ] Hudson commented on HDFS-2540: -- Integrated in Hadoop-Mapreduce-0.23-Build #86 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/86/]) svn merge -c 1199396 from trunk for HDFS-2540. szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199403 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/HttpOpParam.java Change WebHdfsFileSystem to two-step create/append -- Key: HDFS-2540 URL: https://issues.apache.org/jira/browse/HDFS-2540 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.20.205.1, 0.20.206.0, 0.23.0, 0.24.0, 0.23.1 Attachments: h2540_2007.patch, h2540_2007_0.20s.patch, h2540_2008.patch, h2540_2008_0.20s.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2540) Change WebHdfsFileSystem to two-step create/append
[ https://issues.apache.org/jira/browse/HDFS-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147036#comment-13147036 ] Hudson commented on HDFS-2540: -- Integrated in Hadoop-Mapreduce-trunk #892 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/892/]) HDFS-2540. Webhdfs: change Expect: 100-continue to two-step write; change HdfsFileStatus and localName respectively to FileStatus and pathSuffix in JSON response. szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199396 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/HttpOpParam.java Change WebHdfsFileSystem to two-step create/append -- Key: HDFS-2540 URL: https://issues.apache.org/jira/browse/HDFS-2540 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.20.205.1, 0.20.206.0, 0.23.0, 0.24.0, 0.23.1 Attachments: h2540_2007.patch, h2540_2007_0.20s.patch, h2540_2008.patch, h2540_2008_0.20s.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2542) Transparent compression storage in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147141#comment-13147141 ] jinglong.liujl commented on HDFS-2542: -- To Tim: Absolutely, efficiency of compression depend on codec and data to be compress. In the first step, we can use a specify codec on prototype. In the future, we can use right codec for different data in an self-adaption way, but I have no idea in implement it in an effective way yet. In our prototype,we decide when to compress in two ways. 1. data xceiver number and number of compressing task. When a datanode has a high data xceiver number, it always means it've to serve for many client request(include balance/block replication).At this time, I think, compression is not a very urgent task, so it can be slow down, and release resource for computing task. 2. We make compression as a single process, and make it running as idle CPU class. In this way, when some CPU-intensive job coming, compression task can release CPU slice to job, and when our cluster idle, compression can work in full speed. IN any event, I don't think it is a given that compression of hot data will always be inefficient in all codecs for all hardware for all users at all times. It's right, compression before upload can save bandwidth and reduce transmission cost, but it'll slow down running job. It's a trade off. In our cluster, CPU utilization isn't mean in every time, so use idle time to make compression is valuable. To reduce transmission cost, we'll support compression on client as while. To Robert: Absolutely, detection of hot/cold data is really an important thing.To distinguish them, we add atime in block level.Atime will be updated only when any DFSClient read it, and block replication,block scanner or re-balance should not modify it.This value will be store in disk, to avoid atime loss when datanode restart. Back to cold/hot data topic, we can make many improvements for different application. For example, if we use hdfs as an image storage, and hot image can be accessed for thousands of time in a second, we can use SSD to reduce latency, and use sata disk for cold data for cost-effective. Currently, in our hadoop cluster, low latency is not a very important thing, so for cost-effective, we have not made any improvements for hot data.But for cold data, I think , compression + RaidNode + cheaper disk is a feasible way to limit storage cost. Transparent compression storage in HDFS --- Key: HDFS-2542 URL: https://issues.apache.org/jira/browse/HDFS-2542 Project: Hadoop HDFS Issue Type: Bug Reporter: jinglong.liujl As HDFS-2115, we want to provide a mechanism to improve storage usage in hdfs by compression. Different from HDFS-2115, this issue focus on compress storage. Some idea like below: To do: 1. compress cold data. Cold data: After writing (or last read), data has not touched by anyone for a long time. Hot data: After writing, many client will read it , maybe it'll delele soon. Because hot data compression is not cost-effective, we only compress cold data. In some cases, some data in file can be access in high frequency, but in the same file, some data may be cold data. To distinguish them, we compress in block level. 2. compress data which has high compress ratio. To specify high/low compress ratio, we should try to compress data, if compress ratio is too low, we'll never compress them. 2. forward compatibility. After compression, data format in datanode has changed. Old client will not access them. To solve this issue, we provide a mechanism which decompress on datanode. 3. support random access and append. As HDFS-2115, random access can be support by index. We separate data before compress by fixed-length (we call these fixed-length data as chunk), every chunk has its index. When random access, we can seek to the nearest index, and read this chunk for precise position. 4. async compress to avoid compression slow down running job. In practice, we found the cluster CPU usage is not uniform. Some clusters are idle at night, and others are idle at afternoon. We should make compress task running in full speed when cluster idle, and in low speed when cluster busy. Will do: 1. client specific codec and support compress transmission. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2542) Transparent compression storage in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147152#comment-13147152 ] Robert Joseph Evans commented on HDFS-2542: --- To Jinglong: I agree completely with you, I just wanted to be sure that any final solution provided a generic solution. Something that can cleanly separate out classification of hot vs. cold vs really cold data from any extra processing that might happen when data goes from one classification to another. Access time is a great start, but I can imagine a lot of potential innovation and experimentation in this area. I can also see lots of different groups wanting to do something when the classification changes. Like you said, compress the data, possibly move it to a different disk, possibly apply RAID to it. What ever we do it should be something that is also pluggable. Transparent compression storage in HDFS --- Key: HDFS-2542 URL: https://issues.apache.org/jira/browse/HDFS-2542 Project: Hadoop HDFS Issue Type: Bug Reporter: jinglong.liujl As HDFS-2115, we want to provide a mechanism to improve storage usage in hdfs by compression. Different from HDFS-2115, this issue focus on compress storage. Some idea like below: To do: 1. compress cold data. Cold data: After writing (or last read), data has not touched by anyone for a long time. Hot data: After writing, many client will read it , maybe it'll delele soon. Because hot data compression is not cost-effective, we only compress cold data. In some cases, some data in file can be access in high frequency, but in the same file, some data may be cold data. To distinguish them, we compress in block level. 2. compress data which has high compress ratio. To specify high/low compress ratio, we should try to compress data, if compress ratio is too low, we'll never compress them. 2. forward compatibility. After compression, data format in datanode has changed. Old client will not access them. To solve this issue, we provide a mechanism which decompress on datanode. 3. support random access and append. As HDFS-2115, random access can be support by index. We separate data before compress by fixed-length (we call these fixed-length data as chunk), every chunk has its index. When random access, we can seek to the nearest index, and read this chunk for precise position. 4. async compress to avoid compression slow down running job. In practice, we found the cluster CPU usage is not uniform. Some clusters are idle at night, and others are idle at afternoon. We should make compress task running in full speed when cluster idle, and in low speed when cluster busy. Will do: 1. client specific codec and support compress transmission. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2495) Increase granularity of write operations in ReplicationMonitor thus reducing contention for write lock
[ https://issues.apache.org/jira/browse/HDFS-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147231#comment-13147231 ] Tsz Wo (Nicholas), SZE commented on HDFS-2495: -- Tom and Hairong, thanks for the great work! Increase granularity of write operations in ReplicationMonitor thus reducing contention for write lock -- Key: HDFS-2495 URL: https://issues.apache.org/jira/browse/HDFS-2495 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Reporter: Tomasz Nykiel Assignee: Tomasz Nykiel Fix For: 0.24.0 Attachments: replicationMon.patch, replicationMon.patch-1 For processing blocks in ReplicationMonitor (BlockManager.computeReplicationWork), we first obtain a list of blocks to be replicated by calling chooseUnderReplicatedBlocks, and then for each block which was found, we call computeReplicationWorkForBlock. The latter processes a block in three stages, acquiring the writelock twice per call: 1. obtaining block related info (livenodes, srcnode, etc.) under lock 2. choosing target for replication 3. scheduling replication (under lock) We would like to change this behaviour and decrease contention for the write lock, by batching blocks and executing 1,2,3, for sets of blocks, rather than for each one separately. This would decrease the number of writeLock to 2, from 2*numberofblocks. Also, the info level logging can be pushed outside the writelock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2539) Support doAs and GETHOMEDIRECTORY in webhdfs
[ https://issues.apache.org/jira/browse/HDFS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2539: - Attachment: h2539_2009.patch h2539_2009_0.20s.patch h2539_2009_0.20s.patch h2539_2009.patch - changed redirect to support doAs; - added delegation parameter check in GETDELEGATIONTOKEN; - added more tests. Support doAs and GETHOMEDIRECTORY in webhdfs Key: HDFS-2539 URL: https://issues.apache.org/jira/browse/HDFS-2539 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h2539_2008.patch, h2539_2008_0.20s.patch, h2539_2008_0.20s.patch, h2539_2009.patch, h2539_2009_0.20s.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-611) Heartbeats times from Datanodes increase when there are plenty of blocks to delete
[ https://issues.apache.org/jira/browse/HDFS-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-611: -- Target Version/s: 0.20.205.1 Fix Version/s: 0.20.205.1 Heartbeats times from Datanodes increase when there are plenty of blocks to delete -- Key: HDFS-611 URL: https://issues.apache.org/jira/browse/HDFS-611 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.1, 0.21.0, 0.22.0 Reporter: dhruba borthakur Assignee: Zheng Shao Fix For: 0.20.205.1, 0.21.0 Attachments: HDFS-611-branch-0.20-security.patch, HDFS-611.branch-19.patch, HDFS-611.branch-19.v2.patch, HDFS-611.branch-20.patch, HDFS-611.branch-20.v2.patch, HDFS-611.branch-20.v6.patch, HDFS-611.trunk.patch, HDFS-611.trunk.v2.patch, HDFS-611.trunk.v3.patch, HDFS-611.trunk.v4.patch, HDFS-611.trunk.v5.patch, HDFS-611.trunk.v6.patch I am seeing that when we delete a large directory that has plenty of blocks, the heartbeat times from datanodes increase significantly from the normal value of 3 seconds to as large as 50 seconds or so. The heartbeat thread in the Datanode deletes a bunch of blocks sequentially, this causes the heartbeat times to increase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2539) Support doAs and GETHOMEDIRECTORY in webhdfs
[ https://issues.apache.org/jira/browse/HDFS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2539: - Attachment: h2539_2009b.patch h2539_2009b_0.20s.patch h2539_2009b_0.20s.patch h2539_2009b.patch synced the codes in trunk and 0.20s. Support doAs and GETHOMEDIRECTORY in webhdfs Key: HDFS-2539 URL: https://issues.apache.org/jira/browse/HDFS-2539 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h2539_2008.patch, h2539_2008_0.20s.patch, h2539_2008_0.20s.patch, h2539_2009.patch, h2539_2009_0.20s.patch, h2539_2009b.patch, h2539_2009b_0.20s.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-611) Heartbeats times from Datanodes increase when there are plenty of blocks to delete
[ https://issues.apache.org/jira/browse/HDFS-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147305#comment-13147305 ] Tsz Wo (Nicholas), SZE commented on HDFS-611: - +1 the 0.20s patch looks good. Heartbeats times from Datanodes increase when there are plenty of blocks to delete -- Key: HDFS-611 URL: https://issues.apache.org/jira/browse/HDFS-611 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.1, 0.21.0, 0.22.0 Reporter: dhruba borthakur Assignee: Zheng Shao Fix For: 0.20.205.1, 0.21.0 Attachments: HDFS-611-branch-0.20-security.patch, HDFS-611.branch-19.patch, HDFS-611.branch-19.v2.patch, HDFS-611.branch-20.patch, HDFS-611.branch-20.v2.patch, HDFS-611.branch-20.v6.patch, HDFS-611.trunk.patch, HDFS-611.trunk.v2.patch, HDFS-611.trunk.v3.patch, HDFS-611.trunk.v4.patch, HDFS-611.trunk.v5.patch, HDFS-611.trunk.v6.patch I am seeing that when we delete a large directory that has plenty of blocks, the heartbeat times from datanodes increase significantly from the normal value of 3 seconds to as large as 50 seconds or so. The heartbeat thread in the Datanode deletes a bunch of blocks sequentially, this causes the heartbeat times to increase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2539) Support doAs and GETHOMEDIRECTORY in webhdfs
[ https://issues.apache.org/jira/browse/HDFS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147329#comment-13147329 ] Hadoop QA commented on HDFS-2539: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503119/h2539_2009.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 16 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.TestBalancerBandwidth +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1547//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1547//console This message is automatically generated. Support doAs and GETHOMEDIRECTORY in webhdfs Key: HDFS-2539 URL: https://issues.apache.org/jira/browse/HDFS-2539 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h2539_2008.patch, h2539_2008_0.20s.patch, h2539_2008_0.20s.patch, h2539_2009.patch, h2539_2009_0.20s.patch, h2539_2009b.patch, h2539_2009b_0.20s.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2539) Support doAs and GETHOMEDIRECTORY in webhdfs
[ https://issues.apache.org/jira/browse/HDFS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147355#comment-13147355 ] Hadoop QA commented on HDFS-2539: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503127/h2539_2009b.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 16 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestFileCreationClient org.apache.hadoop.hdfs.TestSetrepIncreasing +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1548//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1548//console This message is automatically generated. Support doAs and GETHOMEDIRECTORY in webhdfs Key: HDFS-2539 URL: https://issues.apache.org/jira/browse/HDFS-2539 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h2539_2008.patch, h2539_2008_0.20s.patch, h2539_2008_0.20s.patch, h2539_2009.patch, h2539_2009_0.20s.patch, h2539_2009b.patch, h2539_2009b_0.20s.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2539) Support doAs and GETHOMEDIRECTORY in webhdfs
[ https://issues.apache.org/jira/browse/HDFS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2539: - Attachment: h2539_2009c.patch h2539_2009c_0.20s.patch h2539_2009c_0.20s.patch h2539_2009c.patch Support case insensitive parameter names in AuthFilter. Support doAs and GETHOMEDIRECTORY in webhdfs Key: HDFS-2539 URL: https://issues.apache.org/jira/browse/HDFS-2539 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h2539_2008.patch, h2539_2008_0.20s.patch, h2539_2008_0.20s.patch, h2539_2009.patch, h2539_2009_0.20s.patch, h2539_2009b.patch, h2539_2009b_0.20s.patch, h2539_2009c.patch, h2539_2009c_0.20s.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-1943) fail to start datanode while start-dfs.sh is executed by root user
[ https://issues.apache.org/jira/browse/HDFS-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley resolved HDFS-1943. -- Resolution: Fixed Fix Version/s: 0.20.205.1 Target Version/s: 0.20.205.1, 0.22.0, 0.23.0 (was: 0.23.0, 0.22.0, 0.20.205.1) Committed to branch-0.20-security and branch-0.20-security-205. fail to start datanode while start-dfs.sh is executed by root user -- Key: HDFS-1943 URL: https://issues.apache.org/jira/browse/HDFS-1943 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 0.20.205.0, 0.22.0, 0.23.0 Reporter: Wei Yongjun Assignee: Matt Foley Priority: Blocker Fix For: 0.20.205.1, 0.22.0, 0.23.0 Attachments: HDFS-1943-branch-0.20-security.patch, HDFS-1943.patch When start-dfs.sh is run by root user, we got the following error message: # start-dfs.sh Starting namenodes on [localhost ] localhost: namenode running as process 2556. Stop it first. localhost: starting datanode, logging to /usr/hadoop/hadoop-common-0.23.0-SNAPSHOT/bin/../logs/hadoop-root-datanode-cspf01.out localhost: Unrecognized option: -jvm localhost: Could not create the Java virtual machine. The -jvm options should be passed to jsvc when we starting a secure datanode, but it still passed to java when start-dfs.sh is run by root while secure datanode is disabled. This is a bug of bin/hdfs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2539) Support doAs and GETHOMEDIRECTORY in webhdfs
[ https://issues.apache.org/jira/browse/HDFS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147421#comment-13147421 ] Hadoop QA commented on HDFS-2539: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503146/h2539_2009c.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 17 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.TestBalancerBandwidth +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1549//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1549//console This message is automatically generated. Support doAs and GETHOMEDIRECTORY in webhdfs Key: HDFS-2539 URL: https://issues.apache.org/jira/browse/HDFS-2539 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h2539_2008.patch, h2539_2008_0.20s.patch, h2539_2008_0.20s.patch, h2539_2009.patch, h2539_2009_0.20s.patch, h2539_2009b.patch, h2539_2009b_0.20s.patch, h2539_2009c.patch, h2539_2009c_0.20s.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2246) Shortcut a local client reads to a Datanodes files directly
[ https://issues.apache.org/jira/browse/HDFS-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-2246: - Fix Version/s: (was: 0.20.205.1) Intent to patch is expressed in Target Version, please mark Fix Version only after commit. Shortcut a local client reads to a Datanodes files directly --- Key: HDFS-2246 URL: https://issues.apache.org/jira/browse/HDFS-2246 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sanjay Radia Attachments: 0001-HDFS-347.-Local-reads.patch, HDFS-2246-branch-0.20-security-205.patch, HDFS-2246-branch-0.20-security-205.patch, HDFS-2246-branch-0.20-security-205.patch, HDFS-2246-branch-0.20-security.3.patch, HDFS-2246-branch-0.20-security.no-softref.patch, HDFS-2246-branch-0.20-security.patch, HDFS-2246.20s.1.patch, HDFS-2246.20s.2.txt, HDFS-2246.20s.3.txt, HDFS-2246.20s.4.txt, HDFS-2246.20s.patch, localReadShortcut20-security.2patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2450) Only complete hostname is supported to access data via hdfs://
[ https://issues.apache.org/jira/browse/HDFS-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-2450: - Fix Version/s: 0.20.205.1 Only complete hostname is supported to access data via hdfs:// -- Key: HDFS-2450 URL: https://issues.apache.org/jira/browse/HDFS-2450 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.205.0 Reporter: Rajit Saha Assignee: Daryn Sharp Fix For: 0.20.205.1 Attachments: HDFS-2450-1.patch, HDFS-2450-2.patch, HDFS-2450-3.patch, HDFS-2450-4.patch, HDFS-2450-5.patch, HDFS-2450.patch, IP vs. Hostname.pdf If my complete hostname is host1.abc.xyz.com, only complete hostname must be used to access data via hdfs:// I am running following in .20.205 Client to get data from .20.205 NN (host1) $hadoop dfs -copyFromLocal /etc/passwd hdfs://host1/tmp copyFromLocal: Wrong FS: hdfs://host1/tmp, expected: hdfs://host1.abc.xyz.com Usage: java FsShell [-copyFromLocal localsrc ... dst] $hadoop dfs -copyFromLocal /etc/passwd hdfs://host1.abc/tmp/ copyFromLocal: Wrong FS: hdfs://host1.blue/tmp/1, expected: hdfs://host1.abc.xyz.com Usage: java FsShell [-copyFromLocal localsrc ... dst] $hadoop dfs -copyFromLocal /etc/passwd hftp://host1.abc.xyz/tmp/ copyFromLocal: Wrong FS: hdfs://host1.blue/tmp/1, expected: hdfs://host1.abc.xyz.com Usage: java FsShell [-copyFromLocal localsrc ... dst] Only following is supported $hadoop dfs -copyFromLocal /etc/passwd hdfs://host1.abc.xyz.com/tmp/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2542) Transparent compression storage in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147448#comment-13147448 ] jinglong.liujl commented on HDFS-2542: -- I agree with you. To classify cold/hot data and store them in lower cost is a generic issue, and it's always related to application characteristic, so I think we should make strategy pluggable, and provider a default implement. Transparent compression storage in HDFS --- Key: HDFS-2542 URL: https://issues.apache.org/jira/browse/HDFS-2542 Project: Hadoop HDFS Issue Type: Bug Reporter: jinglong.liujl As HDFS-2115, we want to provide a mechanism to improve storage usage in hdfs by compression. Different from HDFS-2115, this issue focus on compress storage. Some idea like below: To do: 1. compress cold data. Cold data: After writing (or last read), data has not touched by anyone for a long time. Hot data: After writing, many client will read it , maybe it'll delele soon. Because hot data compression is not cost-effective, we only compress cold data. In some cases, some data in file can be access in high frequency, but in the same file, some data may be cold data. To distinguish them, we compress in block level. 2. compress data which has high compress ratio. To specify high/low compress ratio, we should try to compress data, if compress ratio is too low, we'll never compress them. 2. forward compatibility. After compression, data format in datanode has changed. Old client will not access them. To solve this issue, we provide a mechanism which decompress on datanode. 3. support random access and append. As HDFS-2115, random access can be support by index. We separate data before compress by fixed-length (we call these fixed-length data as chunk), every chunk has its index. When random access, we can seek to the nearest index, and read this chunk for precise position. 4. async compress to avoid compression slow down running job. In practice, we found the cluster CPU usage is not uniform. Some clusters are idle at night, and others are idle at afternoon. We should make compress task running in full speed when cluster idle, and in low speed when cluster busy. Will do: 1. client specific codec and support compress transmission. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2547) Design doc is wrong about default block placement policy.
Design doc is wrong about default block placement policy. - Key: HDFS-2547 URL: https://issues.apache.org/jira/browse/HDFS-2547 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 0.24.0 Attachments: HDFS-2547.patch bq. For the common case, when the replication factor is three, HDFS's placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same *remote* rack. Should actually be: and the last on a different node in the same *local* rack. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2246) Shortcut a local client reads to a Datanodes files directly
[ https://issues.apache.org/jira/browse/HDFS-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147459#comment-13147459 ] Tsz Wo (Nicholas), SZE commented on HDFS-2246: -- - In BlockReaderLocal, -* change newBlockReader(..) to package private -* remove checkShortCircuitRead(..) since DFSClient has checked it. -* remove initLocalDatanodeInfo(..); check null and init in getLocalDatanodeInfo(..) -* add a local variable for LocalDatanodeInfo in getBlockPathInfo(..) and newBlockReader(..) so that we don't have to lookup the map multiple times. -* Both pairs createDatanodeProxy/resetDatanodeProxy and getProxy/setProxy are synchronized. So one of the synchronizations is redundant. I suggest move createDatanodeProxy/resetDatanodeProxy to LocalDatanodeInfo and sync inside LocalDatanodeInfo. - change DFSClient.getLocalBlockReader(..) to private - In DataNode, -* check DFS_CLIENT_READ_SHORTCIRCUIT when initializing userWithLocalPathAccess. What should happen if DFS_CLIENT_READ_SHORTCIRCUIT is false but DFS_BLOCK_LOCAL_PATH_ACCESS_USER_KEY is set to some user? - use org.junit.Assert in TestShortCircuitLocalRead Shortcut a local client reads to a Datanodes files directly --- Key: HDFS-2246 URL: https://issues.apache.org/jira/browse/HDFS-2246 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sanjay Radia Attachments: 0001-HDFS-347.-Local-reads.patch, HDFS-2246-branch-0.20-security-205.patch, HDFS-2246-branch-0.20-security-205.patch, HDFS-2246-branch-0.20-security-205.patch, HDFS-2246-branch-0.20-security.3.patch, HDFS-2246-branch-0.20-security.no-softref.patch, HDFS-2246-branch-0.20-security.patch, HDFS-2246.20s.1.patch, HDFS-2246.20s.2.txt, HDFS-2246.20s.3.txt, HDFS-2246.20s.4.txt, HDFS-2246.20s.patch, localReadShortcut20-security.2patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2547) Design doc is wrong about default block placement policy.
[ https://issues.apache.org/jira/browse/HDFS-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-2547: -- Status: Patch Available (was: Open) Design doc is wrong about default block placement policy. - Key: HDFS-2547 URL: https://issues.apache.org/jira/browse/HDFS-2547 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 0.24.0 Attachments: HDFS-2547.patch bq. For the common case, when the replication factor is three, HDFS's placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same *remote* rack. Should actually be: and the last on a different node in the same *local* rack. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2346) TestHost2NodesMap TestReplicasMap will fail depending upon execution order of test methods
[ https://issues.apache.org/jira/browse/HDFS-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-2346: - Affects Version/s: (was: 0.24.0) Fix Version/s: 0.23.0 0.22.0 0.20.205.1 TestHost2NodesMap TestReplicasMap will fail depending upon execution order of test methods Key: HDFS-2346 URL: https://issues.apache.org/jira/browse/HDFS-2346 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.20.205.0, 0.22.0, 0.23.0 Reporter: Uma Maheswara Rao G Assignee: Laxman Priority: Blocker Fix For: 0.20.205.1, 0.22.0, 0.23.0 Attachments: HDFS-2346.20-security.205.Patch, HDFS-2346.22Branch.patch, HDFS-2346.Trunk.patch, HDFS-2346.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2346) TestHost2NodesMap TestReplicasMap will fail depending upon execution order of test methods
[ https://issues.apache.org/jira/browse/HDFS-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-2346: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch-0.20-security and branch-0.20-security-205. Observed fixed in all affected versions, so marking Resolved. TestHost2NodesMap TestReplicasMap will fail depending upon execution order of test methods Key: HDFS-2346 URL: https://issues.apache.org/jira/browse/HDFS-2346 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.20.205.0, 0.22.0, 0.23.0 Reporter: Uma Maheswara Rao G Assignee: Laxman Priority: Blocker Fix For: 0.20.205.1, 0.22.0, 0.23.0 Attachments: HDFS-2346.20-security.205.Patch, HDFS-2346.22Branch.patch, HDFS-2346.Trunk.patch, HDFS-2346.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira