[jira] [Commented] (HDFS-4340) Update addBlock() to inculde inode id as additional argument
[ https://issues.apache.org/jira/browse/HDFS-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558885#comment-13558885 ] Tsz Wo (Nicholas), SZE commented on HDFS-4340: -- Some more comments below: - Since startFileInternal(..) is not changed, appendFileInt(..) do not need to return file status. - the old ClientProtocol.addBlock(..) should be removed. - checkLease(String src, String holder, INode file) is not needed. Only getAdditionalBlock(..) calls it and fileId is in the parameter list. Update addBlock() to inculde inode id as additional argument Key: HDFS-4340 URL: https://issues.apache.org/jira/browse/HDFS-4340 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4350) Make enabling of stale marking on read and write paths independent
[ https://issues.apache.org/jira/browse/HDFS-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559010#comment-13559010 ] Andrew Wang commented on HDFS-4350: --- Todd's patch looks good to me. I ran the failed tests a couple times locally and they passed, and earlier run on this jira were fine. Make enabling of stale marking on read and write paths independent -- Key: HDFS-4350 URL: https://issues.apache.org/jira/browse/HDFS-4350 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-4350-1.patch, hdfs-4350-2.patch, hdfs-4350-3.patch, hdfs-4350-4.patch, hdfs-4350.txt Marking of datanodes as stale for the read and write path was introduced in HDFS-3703 and HDFS-3912 respectively. This is enabled using two new keys, {{DFS_NAMENODE_CHECK_STALE_DATANODE_KEY}} and {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY}}. However, there currently exists a dependency, since you cannot enable write marking without also enabling read marking, since the first key enables both checking of staleness and read marking. I propose renaming the first key to {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_READ_KEY}}, and make checking enabled if either of the keys are set. This will allow read and write marking to be enabled independently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4131) Add a tool to print the diff between two snapshots and diff of a snapshot from the current tree
[ https://issues.apache.org/jira/browse/HDFS-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4131: Attachment: HDFS-4131.002.patch Update the patch based on HDFS-4414+4131.002.patch in HDFS-4414: fix the code for checking if the metadata of a directory has been changed between snapshots. Add a tool to print the diff between two snapshots and diff of a snapshot from the current tree --- Key: HDFS-4131 URL: https://issues.apache.org/jira/browse/HDFS-4131 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: Snapshot (HDFS-2802) Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: HDFS-4131.001.patch, HDFS-4131.002.patch This jira tracks tool to print diff between an two snapshots at a given path. The tool will also print the difference between the current directory and the given snapshot. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-4131) Add a tool to print the diff between two snapshots and diff of a snapshot from the current tree
[ https://issues.apache.org/jira/browse/HDFS-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-4131: --- Assignee: Jing Zhao (was: Suresh Srinivas) Add a tool to print the diff between two snapshots and diff of a snapshot from the current tree --- Key: HDFS-4131 URL: https://issues.apache.org/jira/browse/HDFS-4131 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: Snapshot (HDFS-2802) Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-4131.001.patch, HDFS-4131.002.patch This jira tracks tool to print diff between an two snapshots at a given path. The tool will also print the difference between the current directory and the given snapshot. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4416) change dfs.datanode.domain.socket.path to dfs.domain.socket.path
[ https://issues.apache.org/jira/browse/HDFS-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559013#comment-13559013 ] Todd Lipcon commented on HDFS-4416: --- +1, committing momentarily change dfs.datanode.domain.socket.path to dfs.domain.socket.path Key: HDFS-4416 URL: https://issues.apache.org/jira/browse/HDFS-4416 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4416.001.patch, HDFS-4416.002.patch, HDFS-4416.003.patch, HDFS-4416.004.patch {{dfs.datanode.domain.socket.path}} is used by both clients and the DataNode, so it might be best to avoid putting 'datanode' in the name. Most of the configuration keys that have 'datanode' in the name apply only to the DN. Also, should change __PORT__ to _PORT to be consistent with _HOST, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4416) change dfs.datanode.domain.socket.path to dfs.domain.socket.path
[ https://issues.apache.org/jira/browse/HDFS-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4416. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to branch. Thanks, Colin. change dfs.datanode.domain.socket.path to dfs.domain.socket.path Key: HDFS-4416 URL: https://issues.apache.org/jira/browse/HDFS-4416 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4416.001.patch, HDFS-4416.002.patch, HDFS-4416.003.patch, HDFS-4416.004.patch {{dfs.datanode.domain.socket.path}} is used by both clients and the DataNode, so it might be best to avoid putting 'datanode' in the name. Most of the configuration keys that have 'datanode' in the name apply only to the DN. Also, should change __PORT__ to _PORT to be consistent with _HOST, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2554) Add separate metrics for missing blocks with desired replication level 1
[ https://issues.apache.org/jira/browse/HDFS-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-2554: Target Version/s: (was: ) Status: Open (was: Patch Available) Add separate metrics for missing blocks with desired replication level 1 Key: HDFS-2554 URL: https://issues.apache.org/jira/browse/HDFS-2554 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: Andy Isaacson Priority: Minor Attachments: hdfs-2554-1.txt, hdfs-2554.txt Some users use replication level set to 1 for datasets which are unimportant and can be lost with no worry (eg the output of terasort tests). But other data on the cluster is important and should not be lost. It would be useful to separate the metric for missing blocks by the desired replication level of those blocks, so that one could ignore missing blocks at repl 1 while still alerting on missing blocks with higher desired replication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4414) Create a DiffReport class to represent the diff between snapshots to end users
[ https://issues.apache.org/jira/browse/HDFS-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559027#comment-13559027 ] Aaron T. Myers commented on HDFS-4414: -- This seems like a great feature to add a public-facing (unstable or evolving) programmatic API for. Given that, consider moving this API to the HdfsAdmin class instead of DistributedFileSystem, which is marked only LimitedPrivate to MR and HBase? Create a DiffReport class to represent the diff between snapshots to end users -- Key: HDFS-4414 URL: https://issues.apache.org/jira/browse/HDFS-4414 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-4414.001.patch, HDFS-4414+4131.002.patch HDFS-4131 computes the difference between two snapshots (or between a snapshot and the current tree). In this jira we create a DiffReport class to represent the diff to end users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly
[ https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559116#comment-13559116 ] Todd Lipcon commented on HDFS-4417: --- {code} - private Peer newPeer(InetSocketAddress addr) throws IOException { + private Peer newRemotePeer(InetSocketAddress addr) throws IOException { {code} How about {{newTcpPeer}}? Remote is kind of vague. {code} + public static DomainSocket getClosedSocket() { +return new DomainSocket(, -1); + } {code} This doesn't seem like a reasonable thing to expose. Instead, since it's just used from tests, could you just create a mock DomainSocket object which throws ClosedChannelException on write? I think the changes to PeerCache are a little over-complicated... why not just have two separate PeerCaches, one for each type of peer? HDFS-347: fix case where local reads get disabled incorrectly - Key: HDFS-4417 URL: https://issues.apache.org/jira/browse/HDFS-4417 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Attachments: HDFS-4417.002.patch, hdfs-4417.txt In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the following case: - a workload is running which puts a bunch of local sockets in the PeerCache - the workload abates for a while, causing the sockets to go stale (ie the DN side disconnects after the keepalive timeout) - the workload starts again In this case, the local socket retrieved from the cache failed the newBlockReader call, and it incorrectly disabled local sockets on that host. This is similar to an earlier bug HDFS-3376, but not quite the same. The next issue we ran into is that, once this happened, it never tried local sockets again, because the cache held lots of TCP sockets. Since we always managed to get a cached socket to the local node, it didn't bother trying local read again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-4237: -- Attachment: HDFS-4237.patch.007 Thank you for the review, Andy. I've uploaded a new patch. In it... I've removed the tab characters. I used 200 * 1024 * 1024 instead of the bitshift. I converted FileSystemContractBaseTest (and the classes that extend it) to JUnit4. Previously, it was written in Junit3 style (extends TestCase), but Junit3 TestCase and Junit4 Assume are incompatible, e.g. HDFS-3966. How does me adding a section on running/developing secure unit tests in the Developer Documentation in http://wiki.apache.org/hadoop/ sound? Is there a better place for documentation? Add unit tests for HTTP-based filesystems against secure MiniDFSCluster --- Key: HDFS-4237 URL: https://issues.apache.org/jira/browse/HDFS-4237 Project: Hadoop HDFS Issue Type: Test Components: security, test, webhdfs Affects Versions: 2.0.0-alpha Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007 Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more security unit tests. A good area to add secure tests is the HTTP-based filesystems (WebHDFS, HttpFs). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559219#comment-13559219 ] Hadoop QA commented on HDFS-4237: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565859/HDFS-4237.patch.007 against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestSecureWebHdfsFileSystemContract {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3861//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3861//console This message is automatically generated. Add unit tests for HTTP-based filesystems against secure MiniDFSCluster --- Key: HDFS-4237 URL: https://issues.apache.org/jira/browse/HDFS-4237 Project: Hadoop HDFS Issue Type: Test Components: security, test, webhdfs Affects Versions: 2.0.0-alpha Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007 Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more security unit tests. A good area to add secure tests is the HTTP-based filesystems (WebHDFS, HttpFs). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559226#comment-13559226 ] Stephen Chu commented on HDFS-4237: --- Woops, I forgot the Assume check in TestSecureWebHdfsFileSystemContract. Add unit tests for HTTP-based filesystems against secure MiniDFSCluster --- Key: HDFS-4237 URL: https://issues.apache.org/jira/browse/HDFS-4237 Project: Hadoop HDFS Issue Type: Test Components: security, test, webhdfs Affects Versions: 2.0.0-alpha Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007 Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more security unit tests. A good area to add secure tests is the HTTP-based filesystems (WebHDFS, HttpFs). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559231#comment-13559231 ] Andy Isaacson commented on HDFS-4237: - {noformat} + String address = 127.0.0.1: + port; {noformat} this line grew some trailing whitespace. {{SecureHdfsTestUtil.java}} license comment has trailing whitespace. {noformat} + * Our unit tests use 127.0.0.1/localhost to address the host running + * the tests. However, WebHDFS secure authentication using localhost is + * not allowed (kerberos authentication will complain it can't find + * the server). The actual hostname must be used. Therefore, to run + * the secure WebHDFS tests in your test environment, make 127.0.0.1 + * resolve to the actual hostname. {noformat} I'm not sure this is an acceptable requirement, but let's go ahead and get it checked in as is. Worst case we just back out this code. (It would be better to teach the tests how to run in a reasonable environment where the hostname resolves to the actual eth0 address or similar. This may mean that it's impossible to do jUnit style tests of Kerberized security.) bq. How does me adding a section on running/developing secure unit tests in the Developer Documentation in http://wiki.apache.org/hadoop/ sound? Is there a better place for documentation? A wiki page sounds like an excellent start. I think it belongs on a new page but you can use your judgment if you find a page where it fits in. Add unit tests for HTTP-based filesystems against secure MiniDFSCluster --- Key: HDFS-4237 URL: https://issues.apache.org/jira/browse/HDFS-4237 Project: Hadoop HDFS Issue Type: Test Components: security, test, webhdfs Affects Versions: 2.0.0-alpha Reporter: Stephen Chu Assignee: Stephen Chu Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007 Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more security unit tests. A good area to add secure tests is the HTTP-based filesystems (WebHDFS, HttpFs). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4340) Update addBlock() to inculde inode id as additional argument
[ https://issues.apache.org/jira/browse/HDFS-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-4340: - Attachment: HDFS-4340.patch Update addBlock() to inculde inode id as additional argument Key: HDFS-4340 URL: https://issues.apache.org/jira/browse/HDFS-4340 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly
[ https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-4417: --- Attachment: HDFS-4417.003.patch HDFS-347: fix case where local reads get disabled incorrectly - Key: HDFS-4417 URL: https://issues.apache.org/jira/browse/HDFS-4417 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Attachments: HDFS-4417.002.patch, HDFS-4417.003.patch, hdfs-4417.txt In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the following case: - a workload is running which puts a bunch of local sockets in the PeerCache - the workload abates for a while, causing the sockets to go stale (ie the DN side disconnects after the keepalive timeout) - the workload starts again In this case, the local socket retrieved from the cache failed the newBlockReader call, and it incorrectly disabled local sockets on that host. This is similar to an earlier bug HDFS-3376, but not quite the same. The next issue we ran into is that, once this happened, it never tried local sockets again, because the cache held lots of TCP sockets. Since we always managed to get a cached socket to the local node, it didn't bother trying local read again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly
[ https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559277#comment-13559277 ] Colin Patrick McCabe commented on HDFS-4417: bq. How about newTcpPeer? Remote is kind of vague. Agree. Using a mock for DomainSocket also worked out well. For PeerCache, I tried out the two-cache solution, but it started getting pretty complicated, since we refer to the cache in many places. Instead, I just added a boolean to the cache key. In {{TestParallelShortCircuitReadUnCached}}, since this *is* a regression test for HDFS-4417, I figured I needed some way to make sure that we were not falling back on TCP sockets to read. So I added {{DFSInputStream#tcpReadsDisabledForTesting}}. I considered several other solutions. Any solution that makes TCP sockets unusable, like setting a bad {{SocketFactory}}, runs into trouble because the first part of the test needs to create the files that we're reading. Killing the {{DataNode#dataXceiverServer}} thread after doing the writes seemed like a promising approach, but it caused exceptions in the {{DFSOutputStream}} worker threads, which led to the (only) {{DataNode}} getting kicked out of the cluster. Another approach is to create a subclass for {{DFSInputStream}} that overrides {{DFSInputStream#newTcpPeer}} to throw an exception. However, getting a {{DFSClient}} to return this subclass is difficult. Possibly Mockito's partial mocks could help here. HDFS-347: fix case where local reads get disabled incorrectly - Key: HDFS-4417 URL: https://issues.apache.org/jira/browse/HDFS-4417 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Attachments: HDFS-4417.002.patch, HDFS-4417.003.patch, hdfs-4417.txt In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the following case: - a workload is running which puts a bunch of local sockets in the PeerCache - the workload abates for a while, causing the sockets to go stale (ie the DN side disconnects after the keepalive timeout) - the workload starts again In this case, the local socket retrieved from the cache failed the newBlockReader call, and it incorrectly disabled local sockets on that host. This is similar to an earlier bug HDFS-3376, but not quite the same. The next issue we ran into is that, once this happened, it never tried local sockets again, because the cache held lots of TCP sockets. Since we always managed to get a cached socket to the local node, it didn't bother trying local read again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4340) Update addBlock() to inculde inode id as additional argument
[ https://issues.apache.org/jira/browse/HDFS-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559300#comment-13559300 ] Hadoop QA commented on HDFS-4340: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565880/HDFS-4340.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3862//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3862//console This message is automatically generated. Update addBlock() to inculde inode id as additional argument Key: HDFS-4340 URL: https://issues.apache.org/jira/browse/HDFS-4340 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte
[ https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-4403: -- Resolution: Fixed Fix Version/s: 2.0.3-alpha 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks for reviewing, Aaron. DFSClient can infer checksum type when not provided by reading first byte - Key: HDFS-4403 URL: https://issues.apache.org/jira/browse/HDFS-4403 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.0.3-alpha Attachments: hdfs-4403.txt, hdfs-4403.txt HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the new protobuf field is optional, with a default of CRC32. This means that this API, when used against an older cluster (like earlier 0.23 releases) will falsely return CRC32 even if that cluster has written files with CRC32C. This can cause issues for distcp, for example. Instead of defaulting the protobuf field to CRC32, we can leave it with no default, and if the OpBlockChecksumResponseProto has no checksum type set, the client can send OP_READ_BLOCK to read the first byte of the block, then grab the checksum type out of that response (which has always been present) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte
[ https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559329#comment-13559329 ] Hudson commented on HDFS-4403: -- Integrated in Hadoop-trunk-Commit #3265 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3265/]) HDFS-4403. DFSClient can infer checksum type when not provided by reading first byte. Contributed by Todd Lipcon. (Revision 1436730) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436730 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileChecksumServlets.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto DFSClient can infer checksum type when not provided by reading first byte - Key: HDFS-4403 URL: https://issues.apache.org/jira/browse/HDFS-4403 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.0.3-alpha Attachments: hdfs-4403.txt, hdfs-4403.txt HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the new protobuf field is optional, with a default of CRC32. This means that this API, when used against an older cluster (like earlier 0.23 releases) will falsely return CRC32 even if that cluster has written files with CRC32C. This can cause issues for distcp, for example. Instead of defaulting the protobuf field to CRC32, we can leave it with no default, and if the OpBlockChecksumResponseProto has no checksum type set, the client can send OP_READ_BLOCK to read the first byte of the block, then grab the checksum type out of that response (which has always been present) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly
[ https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559333#comment-13559333 ] Todd Lipcon commented on HDFS-4417: --- {code} + @VisibleForTesting + public void killDataXceiverServer() { +if (dataXceiverServer != null) { + ((DataXceiverServer) this.dataXceiverServer.getRunnable()).kill(); + this.dataXceiverServer.interrupt(); + dataXceiverServer = null; +} + } {code} Think you forgot to delete this attempt that you didn't end up using. Also the removal of the assert in {{kill}} shouldn't be in the patch anymore. {code} + return Mockito.mock(DomainSocket.class, + new AnswerObject() { +@Override +public Object answer(InvocationOnMock invocation) throws Throwable { + throw new RuntimeException(...); + } }); {code} Can you add a one-line comment explaining this, like 'Return a mock which always throws exceptions on any of its function calls'? Also, fill in the exception text with something like Injected fault instead of ... Looks like your patch might be missing the new test case? I don't see anyone set the {{tcpReadsDisabledForTesting}} flag, nor the {{TestParallelShortCircuitReadUnCached}} class you mentioned. HDFS-347: fix case where local reads get disabled incorrectly - Key: HDFS-4417 URL: https://issues.apache.org/jira/browse/HDFS-4417 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Attachments: HDFS-4417.002.patch, HDFS-4417.003.patch, hdfs-4417.txt In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the following case: - a workload is running which puts a bunch of local sockets in the PeerCache - the workload abates for a while, causing the sockets to go stale (ie the DN side disconnects after the keepalive timeout) - the workload starts again In this case, the local socket retrieved from the cache failed the newBlockReader call, and it incorrectly disabled local sockets on that host. This is similar to an earlier bug HDFS-3376, but not quite the same. The next issue we ran into is that, once this happened, it never tried local sockets again, because the cache held lots of TCP sockets. Since we always managed to get a cached socket to the local node, it didn't bother trying local read again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HDFS-4126) Add reading/writing snapshot information to FSImage
[ https://issues.apache.org/jira/browse/HDFS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559393#comment-13559393 ] Suresh Srinivas edited comment on HDFS-4126 at 1/22/13 4:42 AM: # DFSUtil#byte2String - add javadoc # FSImageFormat.java #* In the javadoc, SnapshotID under FSImage should be snapshotCounter or nextSnapshotID. Should we change the SnapshotManager#snapshotID to SnapshotManager#snapshotCounter? #* As per our conversation, the INodeFile FSImage ContainsBlock will change when we do the file level diff and simplify the FSImage. Hence I am okay with the current code. #* ComputedFileSize in javadoc could be called snapshotFileSize. The corresponding variable name could also be updated accordingly. #* Snapshot in javadoc is missing snapshot name? #* javadoc could consoldiate snapshot supported fields together #* loadRoot should return void and numFiles-- should be used. Returning 1 always just for decrement purpose does not seem intutive. #* Snapshot related methods should be moved to an inner class or separate class. This can be done in a separate jira. # FileWithSnapshot impelementation #insertBefore and #removeSelf code seems to be repeated in implementation? # Add a summary of test information to the javadoc of test methods # For commented tests can please add TODO and a brief description was (Author: sureshms): # DFSUtil#byte2String - add javadoc # FSImageFormat.java #* In the javadoc, SnapshotID under FSImage should be snapshotCounter or nextSnapshotID. Should we change the SnapshotManager#snapshotID to SnapshotManager#snapshotCounter? #* As per our conversation, the INodeFile FSImage ContainsBlock will change when we do the file level diff and simplify the FSImage. Hence I am okay with the current code. #* ComputedFileSize in javadoc could be called snapshotFileSize. The corresponding variable name could also be updated accordingly. #* Snapshot in javadoc is missing snapshot name? #* javadoc could consoldiate snapshot supported fields together n #* loadRoot should return void and numFiles-- should be used. Returning 1 always just for decrement purpose does not seem intutive. #* Snapshot related methods should be moved to an inner class or separate class. This can be done in a separate jira. # FileWithSnapshot impelementation #insertBefore and #removeSelf code seems to be repeated in implementation? # Add a summary of test information to the javadoc of test methods # For commented tests can please add TODO and a brief description Add reading/writing snapshot information to FSImage --- Key: HDFS-4126 URL: https://issues.apache.org/jira/browse/HDFS-4126 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: Snapshot (HDFS-2802) Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-4126.001.patch, HDFS-4126.002.patch, HDFS-4126.002.patch After the changes proposed in HDFS-4125 is completed, reading and writing snapshot related information from FSImage can be implemented. This jira tracks changes required for: # Loading snapshot information from FSImage # Loading snapshot related operations from editlog # Writing snapshot information in FSImage # Unit tests related to this functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4126) Add reading/writing snapshot information to FSImage
[ https://issues.apache.org/jira/browse/HDFS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559393#comment-13559393 ] Suresh Srinivas commented on HDFS-4126: --- # DFSUtil#byte2String - add javadoc # FSImageFormat.java #* In the javadoc, SnapshotID under FSImage should be snapshotCounter or nextSnapshotID. Should we change the SnapshotManager#snapshotID to SnapshotManager#snapshotCounter? #* As per our conversation, the INodeFile FSImage ContainsBlock will change when we do the file level diff and simplify the FSImage. Hence I am okay with the current code. #* ComputedFileSize in javadoc could be called snapshotFileSize. The corresponding variable name could also be updated accordingly. #* Snapshot in javadoc is missing snapshot name? #* javadoc could consoldiate snapshot supported fields together n #* loadRoot should return void and numFiles-- should be used. Returning 1 always just for decrement purpose does not seem intutive. #* Snapshot related methods should be moved to an inner class or separate class. This can be done in a separate jira. # FileWithSnapshot impelementation #insertBefore and #removeSelf code seems to be repeated in implementation? # Add a summary of test information to the javadoc of test methods # For commented tests can please add TODO and a brief description Add reading/writing snapshot information to FSImage --- Key: HDFS-4126 URL: https://issues.apache.org/jira/browse/HDFS-4126 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: Snapshot (HDFS-2802) Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-4126.001.patch, HDFS-4126.002.patch, HDFS-4126.002.patch After the changes proposed in HDFS-4125 is completed, reading and writing snapshot related information from FSImage can be implemented. This jira tracks changes required for: # Loading snapshot information from FSImage # Loading snapshot related operations from editlog # Writing snapshot information in FSImage # Unit tests related to this functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte
[ https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559397#comment-13559397 ] Suresh Srinivas commented on HDFS-4403: --- Todd, sorry got busy with other things. +1 for the change as well. Consider adding a brief release note on the issue with prior branch in the release notes to help users understand the issue. DFSClient can infer checksum type when not provided by reading first byte - Key: HDFS-4403 URL: https://issues.apache.org/jira/browse/HDFS-4403 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.0.3-alpha Attachments: hdfs-4403.txt, hdfs-4403.txt HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the new protobuf field is optional, with a default of CRC32. This means that this API, when used against an older cluster (like earlier 0.23 releases) will falsely return CRC32 even if that cluster has written files with CRC32C. This can cause issues for distcp, for example. Instead of defaulting the protobuf field to CRC32, we can leave it with no default, and if the OpBlockChecksumResponseProto has no checksum type set, the client can send OP_READ_BLOCK to read the first byte of the block, then grab the checksum type out of that response (which has always been present) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte
[ https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-4403: -- Release Note: The HDFS implementation of getFileChecksum() can now operate correctly against earlier-version datanodes which do not include the checksum type information in their checksum response. The checksum type is automatically inferred by issuing a read of the first byte of each block. DFSClient can infer checksum type when not provided by reading first byte - Key: HDFS-4403 URL: https://issues.apache.org/jira/browse/HDFS-4403 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.0.3-alpha Attachments: hdfs-4403.txt, hdfs-4403.txt HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the new protobuf field is optional, with a default of CRC32. This means that this API, when used against an older cluster (like earlier 0.23 releases) will falsely return CRC32 even if that cluster has written files with CRC32C. This can cause issues for distcp, for example. Instead of defaulting the protobuf field to CRC32, we can leave it with no default, and if the OpBlockChecksumResponseProto has no checksum type set, the client can send OP_READ_BLOCK to read the first byte of the block, then grab the checksum type out of that response (which has always been present) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4340) Update addBlock() to inculde inode id as additional argument
[ https://issues.apache.org/jira/browse/HDFS-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559413#comment-13559413 ] Brandon Li commented on HDFS-4340: -- @Nicholas, the new patch addresses your comments. I synchronized streamer.start() to avoid the findbugs warnings. Please let me know if you think it's sort of overkill to do so. Thanks! Update addBlock() to inculde inode id as additional argument Key: HDFS-4340 URL: https://issues.apache.org/jira/browse/HDFS-4340 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4366) Block Replication Policy Implementation May Skip Higher-Priority Blocks for Lower-Priority Blocks
[ https://issues.apache.org/jira/browse/HDFS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559443#comment-13559443 ] Todd Lipcon commented on HDFS-4366: --- This looks good to me. +1. Nice patch, Derek. I'll wait til tomorrow to commit in case anyone else wants to take a look - this is pretty important code so having a few eyes on it would be nice. Block Replication Policy Implementation May Skip Higher-Priority Blocks for Lower-Priority Blocks - Key: HDFS-4366 URL: https://issues.apache.org/jira/browse/HDFS-4366 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Derek Dagit Attachments: HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, hdfs-4366-unittest.patch In certain cases, higher-priority under-replicated blocks can be skipped by the replication policy implementation. The current implementation maintains, for each priority level, an index into a list of blocks that are under-replicated. Together, the lists compose a priority queue (see note later about branch-0.23). In some cases when blocks are removed from a list, the caller (BlockManager) properly handles the index into the list from which it removed a block. In some other cases, the index remains stationary while the list changes. Whenever this happens, and the removed block happened to be at or before the index, the implementation will skip over a block when selecting blocks for replication work. In situations when entire racks are decommissioned, leading to many under-replicated blocks, loss of blocks can occur. Background: HDFS-1765 This patch to trunk greatly improved the state of the replication policy implementation. Prior to the patch, the following details were true: * The block priority queue was no such thing: It was really set of trees that held blocks in natural ordering, that being by the blocks ID, which resulted in iterator walks over the blocks in pseudo-random order. * There was only a single index into an iteration over all of the blocks... * ... meaning the implementation was only successful in respecting priority levels on the first pass. Overall, the behavior was a round-robin-type scheduling of blocks. After the patch * A proper priority queue is implemented, preserving log n operations while iterating over blocks in the order added. * A separate index for each priority is key is kept... * ... allowing for processing of the highest priority blocks first regardless of which priority had last been processed. The change was suggested for branch-0.23 as well as trunk, but it does not appear to have been pulled in. The problem: Although the indices are now tracked in a better way, there is a synchronization issue since the indices are managed outside of methods to modify the contents of the queue. Removal of a block from a priority level without adjusting the index can mean that the index then points to the block after the block it originally pointed to. In the next round of scheduling for that priority level, the block originally pointed to by the index is skipped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira