[jira] Commented: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846973#action_12846973 ] Hairong Kuang commented on HDFS-985: Failed contrib tests seem irrelevant to my patch: /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build.xml:569: The following error occurred while executing this line: [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/build.xml:48: The following error occurred while executing this line: [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/hdfsproxy/build.xml:292: org.codehaus.cargo.container.ContainerException: Failed to download [http://apache.osuosl.org/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip] HDFS should issue multiple RPCs for listing a large directory - Key: HDFS-985 URL: https://issues.apache.org/jira/browse/HDFS-985 Project: Hadoop HDFS Issue Type: New Feature Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: directoryBrowse_0.20yahoo.patch, directoryBrowse_0.20yahoo_1.patch, directoryBrowse_0.20yahoo_2.patch, iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, iterativeLS_trunk3.patch, iterativeLS_trunk3.patch, iterativeLS_trunk4.patch, iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch Currently HDFS issues one RPC from the client to the NameNode for listing a directory. However some directories are large that contain thousands or millions of items. Listing such large directories in one RPC has a few shortcomings: 1. The list operation holds the global fsnamesystem lock for a long time thus blocking other requests. If a large number (like thousands) of such list requests hit NameNode in a short period of time, NameNode will be significantly slowed down. Users end up noticing longer response time or lost connections to NameNode. 2. The response message is uncontrollable big. We observed a response as big as 50M bytes when listing a directory of 300 thousand items. Even with the optimization introduced at HDFS-946 that may be able to cut the response by 20-50%, the response size will still in the magnitude of 10 mega bytes. I propose to implement a directory listing using multiple RPCs. Here is the plan: 1. Each getListing RPC has an upper limit on the number of items returned. This limit could be configurable, but I am thinking to set it to be a fixed number like 500. 2. Each RPC additionally specifies a start position for this listing request. I am thinking to use the last item of the previous listing RPC as an indicator. Since NameNode stores all items in a directory as a sorted array, NameNode uses the last item to locate the start item of this listing even if the last item is deleted in between these two consecutive calls. This has the advantage of avoid duplicate entries at the client side. 3. The return value additionally specifies if the whole directory is done listing. If the client sees a false flag, it will continue to issue another RPC. This proposal will change the semantics of large directory listing in a sense that listing is no longer an atomic operation if a directory's content is changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1046) Build fails trying to download an old version of tomcat
Build fails trying to download an old version of tomcat --- Key: HDFS-1046 URL: https://issues.apache.org/jira/browse/HDFS-1046 Project: Hadoop HDFS Issue Type: Bug Reporter: gary murry It looks like HDFSProxy is trying to get an old version of tomcat (6.0.18). /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/src/contrib/hdfsproxy/build.xml:292: org.codehaus.cargo.container.ContainerException: Failed to download [http://apache.osuosl.org/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip] Looking at http://apache.osuosl.org/tomcat/tomcat-6/ , it looks like the only two version available are 6.0.24 and 6.0.26. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1044) Cannot submit mapreduce job from secure client to unsecure sever
[ https://issues.apache.org/jira/browse/HDFS-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1044: - Attachment: HDFS-1044-BP20-2.patch for previous release, not for commit. Cannot submit mapreduce job from secure client to unsecure sever Key: HDFS-1044 URL: https://issues.apache.org/jira/browse/HDFS-1044 Project: Hadoop HDFS Issue Type: Bug Reporter: Boris Shkolnik Assignee: Boris Shkolnik Attachments: HDFS-1044-BP20-2.patch, HDFS-1044-BP20.patch Looks like it tries to get DelegationToken and fails because SecureManger on Server doesn't start in non-secure environment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1001) DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK
[ https://issues.apache.org/jira/browse/HDFS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847052#action_12847052 ] Todd Lipcon commented on HDFS-1001: --- The body of patch looks good to me. But could we merge TestClientBlockVerification and the new TestDataXceiver? I recall you made the new test so you could be in the server.datanode package, but could the cases of TestClientBlockVerification move in here too? If not, maybe we can at least share a bit of the test code (most of the code except for the test case itself is duplicated) DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK - Key: HDFS-1001 URL: https://issues.apache.org/jira/browse/HDFS-1001 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: bc Wong Assignee: bc Wong Attachments: HDFS-1001-rebased.patch, HDFS-1001.patch, HDFS-1001.patch.1 Running the TestPread with additional debug statements reveals that the BlockReader sends CHECKSUM_OK when the DataXceiver doesn't expect it. Currently it doesn't matter since DataXceiver closes the connection after each op, and CHECKSUM_OK is the last thing on the wire. But if we want to cache connections, they need to agree on the exchange of CHECKSUM_OK. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1044) Cannot submit mapreduce job from secure client to unsecure sever
[ https://issues.apache.org/jira/browse/HDFS-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847070#action_12847070 ] Jitendra Nath Pandey commented on HDFS-1044: +1 Cannot submit mapreduce job from secure client to unsecure sever Key: HDFS-1044 URL: https://issues.apache.org/jira/browse/HDFS-1044 Project: Hadoop HDFS Issue Type: Bug Reporter: Boris Shkolnik Assignee: Boris Shkolnik Attachments: HDFS-1044-BP20-2.patch, HDFS-1044-BP20-3.patch, HDFS-1044-BP20.patch Looks like it tries to get DelegationToken and fails because SecureManger on Server doesn't start in non-secure environment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1015) Intermittent failure in TestSecurityTokenEditLog
[ https://issues.apache.org/jira/browse/HDFS-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847093#action_12847093 ] Konstantin Shvachko commented on HDFS-1015: --- +1 for the patch. Could you please update the components, affected, and fix versions for this jira. Intermittent failure in TestSecurityTokenEditLog Key: HDFS-1015 URL: https://issues.apache.org/jira/browse/HDFS-1015 Project: Hadoop HDFS Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HDFS-1015-y20.1.patch, HDFS-1015.1.patch, HDFS-1015.2.patch This test fails sometimes in hadoop-0.20.100-secondary build. It doesn't fail in trunk or hadoop-0.20.100 build. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1043) Benchmark overhead of server-side group resolution of users
[ https://issues.apache.org/jira/browse/HDFS-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1043: -- Component/s: benchmarks Benchmark overhead of server-side group resolution of users --- Key: HDFS-1043 URL: https://issues.apache.org/jira/browse/HDFS-1043 Project: Hadoop HDFS Issue Type: Test Components: benchmarks Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 0.22.0 Attachments: UGCRefresh.patch Server-side user group resolution was introduced in HADOOP-4656. The benchmark should repeatedly request the name-node for user group resolution, and reset NN's user group cache periodically. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847117#action_12847117 ] Todd Lipcon commented on HDFS-941: -- Style notes: - in BlockReader: {code} + LOG.warn(Could not write to datanode + sock.getInetAddress() + + : + e.getMessage()); {code} should be more specific - like Could not write read result status code and also indicate in the warning somehow that this is not a critical problem. Perhaps info level is better? (in my experience if people see WARN they think something is seriously wrong) - please move the inner SocketCacheEntry class down lower in DFSInputStream - in SocketCacheEntry.setOwner, can you use IOUtils.closeStream to close reader? Similarly in SocketCacheEntry.close - We expect the following may happen reasonably often, right? {code} +// Our socket is no good. +DFSClient.LOG.warn(Error making BlockReader. Closing stale + entry.sock.toString()); {code} I think this should probably be debug level. - The edits to the docs in DataNode.java are good - if possible they should probably move into HDFS-1001 though, no? - the do { ... } while () loop is a bit hard to follow in DataXceiver. Would it be possible to rearrange the code a bit to be more linear? (eg setting DN_KEEPALIVE_TIMEOUT right before the read at the beginning of the loop if workDone 0 would be easier to follow in my opinion) - In DataXceiver: {code} + } catch (IOException ioe) { +LOG.error(Error reading client status response. Will close connection. Err: + ioe); {code} Doesn't this yield error messages on every incomplete client read? Since the response is optional, this seems more like a DEBUG. Bigger stuff: - I think there is a concurrency issue here. Namely, the positional read API calls through into fetchBlockByteRange, which will use the existing cached socket, regardless of other concurrent operations. So we may end up with multiple block readers on the same socket and everything will fall apart. Can you add a test case which tests concurrent use of a DFSInputStream? Maybe a few threads doing random positional reads while another thread does seeks and sequential reads? - Regarding the cache size of one - I don't think this is quite true. For a use case like HBase, the region server is continually slamming the local datanode with random read requests from several client threads. Is the idea that such an application should be using multiple DFSInputStreams to read the same file and handle the multithreading itself? - In DataXceiver, SocketException is caught and ignored while sending a block. (// Its ok for remote side to close the connection anytime. I think there are other SocketException types (eg timeout) that could throw here aside from a connection close, so in that case we need to IOUtils.closeStream(out) I believe. A test case for this could be to open a BlockReader, read some bytes, then stop reading so that the other side's BlockSender generates a timeout. - Not sure about this removal in the finally clause of opWriteBlock: {code} - IOUtils.closeStream(replyOut); {code} (a) We still need to close in the case of an downstream-generated exception. Otherwise we'll read the next data bytes from the writer as an operation and have undefined results. (b) To keep this patch less dangerous, maybe we should not add the reuse feature for operations other than read? Read's the only operation where we expect a lot of very short requests coming in - not much benefit for writes, etc, plus they're more complicated. Datanode xceiver protocol should allow reuse of a connection Key: HDFS-941 URL: https://issues.apache.org/jira/browse/HDFS-941 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: bc Wong Attachments: HDFS-941-1.patch Right now each connection into the datanode xceiver only processes one operation. In the case that an operation leaves the stream in a well-defined state (eg a client reads to the end of a block successfully) the same connection could be reused for a second operation. This should improve random read performance significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1015) Intermittent failure in TestSecurityTokenEditLog
[ https://issues.apache.org/jira/browse/HDFS-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1015: -- Component/s: test name-node Affects Version/s: 0.22.0 Fix Version/s: 0.22.0 Hadoop Flags: [Reviewed] The test failures reported by hudson is not related to this patch. Intermittent failure in TestSecurityTokenEditLog Key: HDFS-1015 URL: https://issues.apache.org/jira/browse/HDFS-1015 Project: Hadoop HDFS Issue Type: Bug Components: name-node, test Affects Versions: 0.22.0 Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: 0.22.0 Attachments: HDFS-1015-y20.1.patch, HDFS-1015.1.patch, HDFS-1015.2.patch This test fails sometimes in hadoop-0.20.100-secondary build. It doesn't fail in trunk or hadoop-0.20.100 build. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1015) Intermittent failure in TestSecurityTokenEditLog
[ https://issues.apache.org/jira/browse/HDFS-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1015: -- Resolution: Fixed Status: Resolved (was: Patch Available) Intermittent failure in TestSecurityTokenEditLog Key: HDFS-1015 URL: https://issues.apache.org/jira/browse/HDFS-1015 Project: Hadoop HDFS Issue Type: Bug Components: name-node, test Affects Versions: 0.22.0 Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: 0.22.0 Attachments: HDFS-1015-y20.1.patch, HDFS-1015.1.patch, HDFS-1015.2.patch This test fails sometimes in hadoop-0.20.100-secondary build. It doesn't fail in trunk or hadoop-0.20.100 build. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1015) Intermittent failure in TestSecurityTokenEditLog
[ https://issues.apache.org/jira/browse/HDFS-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847133#action_12847133 ] Hudson commented on HDFS-1015: -- Integrated in Hadoop-Hdfs-trunk-Commit #216 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/216/]) . Fix intermittent failure in TestSecurityTokenEditLog. Contribute by Jitendra Nath Pandey. Intermittent failure in TestSecurityTokenEditLog Key: HDFS-1015 URL: https://issues.apache.org/jira/browse/HDFS-1015 Project: Hadoop HDFS Issue Type: Bug Components: name-node, test Affects Versions: 0.22.0 Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: 0.22.0 Attachments: HDFS-1015-y20.1.patch, HDFS-1015.1.patch, HDFS-1015.2.patch This test fails sometimes in hadoop-0.20.100-secondary build. It doesn't fail in trunk or hadoop-0.20.100 build. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1044) Cannot submit mapreduce job from secure client to unsecure sever
[ https://issues.apache.org/jira/browse/HDFS-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1044: - Attachment: HDFS-1044-BP20-5.patch Cannot submit mapreduce job from secure client to unsecure sever Key: HDFS-1044 URL: https://issues.apache.org/jira/browse/HDFS-1044 Project: Hadoop HDFS Issue Type: Bug Reporter: Boris Shkolnik Assignee: Boris Shkolnik Attachments: HDFS-1044-BP20-2.patch, HDFS-1044-BP20-3.patch, HDFS-1044-BP20-5.patch, HDFS-1044-BP20.patch Looks like it tries to get DelegationToken and fails because SecureManger on Server doesn't start in non-secure environment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1044) Cannot submit mapreduce job from secure client to unsecure sever
[ https://issues.apache.org/jira/browse/HDFS-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1044: - Attachment: HDFS-1044-BP20-6.patch for previous version, not for commit Cannot submit mapreduce job from secure client to unsecure sever Key: HDFS-1044 URL: https://issues.apache.org/jira/browse/HDFS-1044 Project: Hadoop HDFS Issue Type: Bug Reporter: Boris Shkolnik Assignee: Boris Shkolnik Attachments: HDFS-1044-BP20-2.patch, HDFS-1044-BP20-3.patch, HDFS-1044-BP20-5.patch, HDFS-1044-BP20-6.patch, HDFS-1044-BP20.patch Looks like it tries to get DelegationToken and fails because SecureManger on Server doesn't start in non-secure environment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.