[jira] Commented: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory

2010-03-18 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846973#action_12846973
 ] 

Hairong Kuang commented on HDFS-985:


Failed contrib tests seem irrelevant to my patch:
/grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build.xml:569:
 The following error occurred while executing this line:
 [exec] 
/grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/build.xml:48:
 The following error occurred while executing this line:
 [exec] 
/grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/hdfsproxy/build.xml:292:
 org.codehaus.cargo.container.ContainerException: Failed to download 
[http://apache.osuosl.org/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip]

 HDFS should issue multiple RPCs for listing a large directory
 -

 Key: HDFS-985
 URL: https://issues.apache.org/jira/browse/HDFS-985
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: directoryBrowse_0.20yahoo.patch, 
 directoryBrowse_0.20yahoo_1.patch, directoryBrowse_0.20yahoo_2.patch, 
 iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, 
 iterativeLS_trunk3.patch, iterativeLS_trunk3.patch, iterativeLS_trunk4.patch, 
 iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch


 Currently HDFS issues one RPC from the client to the NameNode for listing a 
 directory. However some directories are large that contain thousands or 
 millions of items. Listing such large directories in one RPC has a few 
 shortcomings:
 1. The list operation holds the global fsnamesystem lock for a long time thus 
 blocking other requests. If a large number (like thousands) of such list 
 requests hit NameNode in a short period of time, NameNode will be 
 significantly slowed down. Users end up noticing longer response time or lost 
 connections to NameNode.
 2. The response message is uncontrollable big. We observed a response as big 
 as 50M bytes when listing a directory of 300 thousand items. Even with the 
 optimization introduced at HDFS-946 that may be able to cut the response by 
 20-50%, the response size will still in the magnitude of 10 mega bytes.
 I propose to implement a directory listing using multiple RPCs. Here is the 
 plan:
 1. Each getListing RPC has an upper limit on the number of items returned.  
 This limit could be configurable, but I am thinking to set it to be a fixed 
 number like 500.
 2. Each RPC additionally specifies a start position for this listing request. 
 I am thinking to use the last item of the previous listing RPC as an 
 indicator. Since NameNode stores all items in a directory as a sorted array, 
 NameNode uses the last item to locate the start item of this listing even if 
 the last item is deleted in between these two consecutive calls. This has the 
 advantage of avoid duplicate entries at the client side.
 3. The return value additionally specifies if the whole directory is done 
 listing. If the client sees a false flag, it will continue to issue another 
 RPC.
 This proposal will change the semantics of large directory listing in a sense 
 that listing is no longer an atomic operation if a directory's content is 
 changing while the listing operation is in progress.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1046) Build fails trying to download an old version of tomcat

2010-03-18 Thread gary murry (JIRA)
Build fails trying to download an old version of tomcat
---

 Key: HDFS-1046
 URL: https://issues.apache.org/jira/browse/HDFS-1046
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: gary murry


It looks like HDFSProxy is trying to get an old version of tomcat (6.0.18).  
/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/src/contrib/hdfsproxy/build.xml:292:
 org.codehaus.cargo.container.ContainerException: Failed to download 
[http://apache.osuosl.org/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip]

Looking at http://apache.osuosl.org/tomcat/tomcat-6/ , it looks like the only 
two version available are 6.0.24 and 6.0.26.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1044) Cannot submit mapreduce job from secure client to unsecure sever

2010-03-18 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-1044:
-

Attachment: HDFS-1044-BP20-2.patch

for previous release, not for commit.

 Cannot submit mapreduce job from secure client to unsecure sever
 

 Key: HDFS-1044
 URL: https://issues.apache.org/jira/browse/HDFS-1044
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Attachments: HDFS-1044-BP20-2.patch, HDFS-1044-BP20.patch


 Looks like it tries to get DelegationToken and fails because SecureManger on 
 Server doesn't start in non-secure environment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1001) DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK

2010-03-18 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847052#action_12847052
 ] 

Todd Lipcon commented on HDFS-1001:
---

The body of patch looks good to me. But could we merge 
TestClientBlockVerification and the new TestDataXceiver? I recall you made the 
new test so you could be in the server.datanode package, but could the cases of 
TestClientBlockVerification move in here too? If not, maybe we can at least 
share a bit of the test code (most of the code except for the test case itself 
is duplicated)

 DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK
 -

 Key: HDFS-1001
 URL: https://issues.apache.org/jira/browse/HDFS-1001
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: bc Wong
Assignee: bc Wong
 Attachments: HDFS-1001-rebased.patch, HDFS-1001.patch, 
 HDFS-1001.patch.1


 Running the TestPread with additional debug statements reveals that the 
 BlockReader sends CHECKSUM_OK when the DataXceiver doesn't expect it. 
 Currently it doesn't matter since DataXceiver closes the connection after 
 each op, and CHECKSUM_OK is the last thing on the wire. But if we want to 
 cache connections, they need to agree on the exchange of CHECKSUM_OK.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1044) Cannot submit mapreduce job from secure client to unsecure sever

2010-03-18 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847070#action_12847070
 ] 

Jitendra Nath Pandey commented on HDFS-1044:


+1

 Cannot submit mapreduce job from secure client to unsecure sever
 

 Key: HDFS-1044
 URL: https://issues.apache.org/jira/browse/HDFS-1044
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Attachments: HDFS-1044-BP20-2.patch, HDFS-1044-BP20-3.patch, 
 HDFS-1044-BP20.patch


 Looks like it tries to get DelegationToken and fails because SecureManger on 
 Server doesn't start in non-secure environment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1015) Intermittent failure in TestSecurityTokenEditLog

2010-03-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847093#action_12847093
 ] 

Konstantin Shvachko commented on HDFS-1015:
---

+1 for the patch.
Could you please update the components, affected, and fix versions for this 
jira.

 Intermittent failure in TestSecurityTokenEditLog
 

 Key: HDFS-1015
 URL: https://issues.apache.org/jira/browse/HDFS-1015
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HDFS-1015-y20.1.patch, HDFS-1015.1.patch, 
 HDFS-1015.2.patch


 This test fails sometimes in hadoop-0.20.100-secondary build. It doesn't fail 
 in trunk or  hadoop-0.20.100 build.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1043) Benchmark overhead of server-side group resolution of users

2010-03-18 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-1043:
--

Component/s: benchmarks

 Benchmark overhead of server-side group resolution of users
 ---

 Key: HDFS-1043
 URL: https://issues.apache.org/jira/browse/HDFS-1043
 Project: Hadoop HDFS
  Issue Type: Test
  Components: benchmarks
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 0.22.0

 Attachments: UGCRefresh.patch


 Server-side user group resolution was introduced in HADOOP-4656. 
 The benchmark should repeatedly request the name-node for user group 
 resolution, and reset NN's user group cache periodically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2010-03-18 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847117#action_12847117
 ] 

Todd Lipcon commented on HDFS-941:
--

Style notes:

- in BlockReader:
{code}
+  LOG.warn(Could not write to datanode  + sock.getInetAddress() +
+   :  + e.getMessage());
{code}
should be more specific - like Could not write read result status code and 
also indicate in the warning somehow that this is not a critical problem. 
Perhaps info level is better? (in my experience if people see WARN they think 
something is seriously wrong)

- please move the inner SocketCacheEntry class down lower in DFSInputStream
- in SocketCacheEntry.setOwner, can you use IOUtils.closeStream to close 
reader? Similarly in SocketCacheEntry.close
- We expect the following may happen reasonably often, right?
{code}
+// Our socket is no good.
+DFSClient.LOG.warn(Error making BlockReader. Closing stale  + 
entry.sock.toString());
{code}
I think this should probably be debug level.

- The edits to the docs in DataNode.java are good - if possible they should 
probably move into HDFS-1001 though, no?

- the do { ... } while () loop is a bit hard to follow in DataXceiver. Would it 
be possible to rearrange the code a bit to be more linear? (eg setting 
DN_KEEPALIVE_TIMEOUT right before the read at the beginning of the loop if 
workDone  0 would be easier to follow in my opinion)

- In DataXceiver:
{code}
+  } catch (IOException ioe) {
+LOG.error(Error reading client status response. Will close 
connection. Err:  + ioe);
{code}
Doesn't this yield error messages on every incomplete client read? Since the 
response is optional, this seems more like a DEBUG.

Bigger stuff:

- I think there is a concurrency issue here. Namely, the positional read API 
calls through into fetchBlockByteRange, which will use the existing cached 
socket, regardless of other concurrent operations. So we may end up with 
multiple block readers on the same socket and everything will fall apart.

Can you add a test case which tests concurrent use of a DFSInputStream? Maybe a 
few threads doing random positional reads while another thread does seeks and 
sequential reads?

- Regarding the cache size of one - I don't think this is quite true. For a use 
case like HBase, the region server is continually slamming the local datanode 
with random read requests from several client threads. Is the idea that such an 
application should be using multiple DFSInputStreams to read the same file and 
handle the multithreading itself?

- In DataXceiver, SocketException is caught and ignored while sending a block. 
(// Its ok for remote side to close the connection anytime. I think there are 
other SocketException types (eg timeout) that could throw here aside from a 
connection close, so in that case we need to IOUtils.closeStream(out) I 
believe. A test case for this could be to open a BlockReader, read some bytes, 
then stop reading so that the other side's BlockSender generates a timeout.


- Not sure about this removal in the finally clause of opWriteBlock:
{code}
-  IOUtils.closeStream(replyOut);
{code}
(a) We still need to close in the case of an downstream-generated exception. 
Otherwise we'll read the next data bytes from the writer as an operation and 
have undefined results.
(b) To keep this patch less dangerous, maybe we should not add the reuse 
feature for operations other than read? Read's the only operation where we 
expect a lot of very short requests coming in - not much benefit for writes, 
etc, plus they're more complicated.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1015) Intermittent failure in TestSecurityTokenEditLog

2010-03-18 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1015:
--

  Component/s: test
   name-node
Affects Version/s: 0.22.0
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]

The test failures reported by hudson is not related to this patch.

 Intermittent failure in TestSecurityTokenEditLog
 

 Key: HDFS-1015
 URL: https://issues.apache.org/jira/browse/HDFS-1015
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node, test
Affects Versions: 0.22.0
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: 0.22.0

 Attachments: HDFS-1015-y20.1.patch, HDFS-1015.1.patch, 
 HDFS-1015.2.patch


 This test fails sometimes in hadoop-0.20.100-secondary build. It doesn't fail 
 in trunk or  hadoop-0.20.100 build.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1015) Intermittent failure in TestSecurityTokenEditLog

2010-03-18 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1015:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Intermittent failure in TestSecurityTokenEditLog
 

 Key: HDFS-1015
 URL: https://issues.apache.org/jira/browse/HDFS-1015
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node, test
Affects Versions: 0.22.0
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: 0.22.0

 Attachments: HDFS-1015-y20.1.patch, HDFS-1015.1.patch, 
 HDFS-1015.2.patch


 This test fails sometimes in hadoop-0.20.100-secondary build. It doesn't fail 
 in trunk or  hadoop-0.20.100 build.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1015) Intermittent failure in TestSecurityTokenEditLog

2010-03-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847133#action_12847133
 ] 

Hudson commented on HDFS-1015:
--

Integrated in Hadoop-Hdfs-trunk-Commit #216 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/216/])
. Fix intermittent failure in TestSecurityTokenEditLog. Contribute by 
Jitendra Nath Pandey.


 Intermittent failure in TestSecurityTokenEditLog
 

 Key: HDFS-1015
 URL: https://issues.apache.org/jira/browse/HDFS-1015
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node, test
Affects Versions: 0.22.0
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: 0.22.0

 Attachments: HDFS-1015-y20.1.patch, HDFS-1015.1.patch, 
 HDFS-1015.2.patch


 This test fails sometimes in hadoop-0.20.100-secondary build. It doesn't fail 
 in trunk or  hadoop-0.20.100 build.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1044) Cannot submit mapreduce job from secure client to unsecure sever

2010-03-18 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-1044:
-

Attachment: HDFS-1044-BP20-5.patch

 Cannot submit mapreduce job from secure client to unsecure sever
 

 Key: HDFS-1044
 URL: https://issues.apache.org/jira/browse/HDFS-1044
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Attachments: HDFS-1044-BP20-2.patch, HDFS-1044-BP20-3.patch, 
 HDFS-1044-BP20-5.patch, HDFS-1044-BP20.patch


 Looks like it tries to get DelegationToken and fails because SecureManger on 
 Server doesn't start in non-secure environment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1044) Cannot submit mapreduce job from secure client to unsecure sever

2010-03-18 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-1044:
-

Attachment: HDFS-1044-BP20-6.patch

for previous version, not for commit

 Cannot submit mapreduce job from secure client to unsecure sever
 

 Key: HDFS-1044
 URL: https://issues.apache.org/jira/browse/HDFS-1044
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Attachments: HDFS-1044-BP20-2.patch, HDFS-1044-BP20-3.patch, 
 HDFS-1044-BP20-5.patch, HDFS-1044-BP20-6.patch, HDFS-1044-BP20.patch


 Looks like it tries to get DelegationToken and fails because SecureManger on 
 Server doesn't start in non-secure environment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.