[jira] [Commented] (HDFS-107) Data-nodes should be formatted when the name-node is formatted.

2011-06-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049002#comment-13049002
 ] 

Konstantin Shvachko commented on HDFS-107:
--

I agree with Todd and others. Option (1) seems to be the way to go.
If you add the config parameter, you will need to distribute new hdfs-site.xml 
to all data-nodes before formatting. Instead you could have just removed the 
storage directories.

> Data-nodes should be formatted when the name-node is formatted.
> ---
>
> Key: HDFS-107
> URL: https://issues.apache.org/jira/browse/HDFS-107
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Konstantin Shvachko
> Attachments: HDFS-107-1.patch
>
>
> The upgrade feature HADOOP-702 requires data-nodes to store persistently the 
> namespaceID 
> in their version files and verify during startup that it matches the one 
> stored on the name-node.
> When the name-node reformats it generates a new namespaceID.
> Now if the cluster starts with the reformatted name-node, and not reformatted 
> data-nodes
> the data-nodes will fail with
> java.io.IOException: Incompatible namespaceIDs ...
> Data-nodes should be reformatted whenever the name-node is. I see 2 
> approaches here:
> 1) In order to reformat the cluster we call "start-dfs -format" or make a 
> special script "format-dfs".
> This would format the cluster components all together. The question is 
> whether it should start
> the cluster after formatting?
> 2) Format the name-node only. When data-nodes connect to the name-node it 
> will tell them to
> format their storage directories if it sees that the namespace is empty and 
> its cTime=0.
> The drawback of this approach is that we can loose blocks of a data-node from 
> another cluster
> if it connects by mistake to the empty name-node.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048979#comment-13048979
 ] 

stack commented on HDFS-941:


I put back TestDataXceiver.  It does this:

{code}
-List blkList = util.writeFile(TEST_FILE, FILE_SIZE_K);
+// Create file.
+util.writeFile(TEST_FILE, FILE_SIZE_K);
+// Now get its blocks.
+List blkList = util.getFileBlocks(TEST_FILE, FILE_SIZE_K);
{code}


rather than change the writeFile signature (writeFile is used in a few other 
places so the change would ripple).


I also added back BlockSender.isBlockReadFully so the tests before we call 
verifiedByClient are as they were before this patch application:

{code}
-if (DataTransferProtocol.Status.read(in) == CHECKSUM_OK) {
-  if (blockSender.isBlockReadFully() && datanode.blockScanner != null) 
{
-datanode.blockScanner.verifiedByClient(block);
+  if (blockSender.didSendEntireByteRange()) {
+// If we sent the entire range, then we should expect the client
+// to respond with a Status enum.
+try {
+  DataTransferProtocol.Status stat = 
DataTransferProtocol.Status.read(in);
+  if (stat == null) {
+LOG.warn("Client " + s.getInetAddress() + "did not send a valid 
status " +
+ "code after reading. Will close connection.");
+IOUtils.closeStream(out);
+  } else if (stat == CHECKSUM_OK) {
+if (blockSender.isBlockReadFully() && datanode.blockScanner != 
null) {
+  datanode.blockScanner.verifiedByClient(block);
+}
   }
{code}

I ran the bundled tests and they pass.  Am currently running all.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-941:
---

Attachment: 941.22.v3.txt

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048875#comment-13048875
 ] 

Todd Lipcon commented on HDFS-941:
--

Yea, I think we should add back the blockReadFully variable (in addition to 
keeping the new sentEntireByteRange variable and its getter).

Looks like there's a new getFileBlocks() method which can be used after 
writeFile() to get the block location, and then keep that test around?

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048863#comment-13048863
 ] 

stack commented on HDFS-941:


Thanks for review Todd.  There is no isBlockReadFully method anymore; this 
patch removes it.  You think I should add that back?  TestDataXceiver was 
removed because BlockReaderTestUtil#writeFile signature changed returning List 
of Blocks instead of byte [].  I can hack it around to work.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048856#comment-13048856
 ] 

Todd Lipcon commented on HDFS-2034:
---

Couple of small nits:
- rather than reassiging the {{length}} parameter, I think it would be clearer 
to just do:
{code}
blocks = getFinalizedBlockRange(offset, Math.min(length, lengthOfCompleteBlk - 
offset));
{code}
don't you think?

- when instantiating the ArrayList, pass a capacity of 1 since you know that it 
will have at most 1 element. (the default capacity for ArrayList is 10)
- our style guide says to use curly braces even for one-line if statements. 
There are a couple missing in the patch
- another style thing: for short inline comments it's usually preferred to 
{{//}} style comments instead of {{/* ... */}} for whatever reason.

> length in getBlockRange becomes -ve when reading only from currently being 
> written blk
> --
>
> Key: HDFS-2034
> URL: https://issues.apache.org/jira/browse/HDFS-2034
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: John George
>Assignee: John George
>Priority: Minor
> Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, 
> HDFS-2034-3.patch, HDFS-2034.patch
>
>
> This came up during HDFS-1907. Posting an example that Todd posted in 
> HDFS-1907 that brought out this issue.
> {quote}
> Here's an example sequence to describe what I mean:
> 1. open file, write one and a half blocks
> 2. call hflush
> 3. another reader asks for the first byte of the second block
> {quote}
> In this case since offset is greater than the completed block length, the 
> math in getBlockRange() of DFSInputStreamer.java will set "length" to 
> negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-107) Data-nodes should be formatted when the name-node is formatted.

2011-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048852#comment-13048852
 ] 

Todd Lipcon commented on HDFS-107:
--

I think adding another config here is unnecessary. What's the downside of 
adding a "-format" flag to the datanode, and having "start-dfs -format" pass it 
along?

> Data-nodes should be formatted when the name-node is formatted.
> ---
>
> Key: HDFS-107
> URL: https://issues.apache.org/jira/browse/HDFS-107
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Konstantin Shvachko
> Attachments: HDFS-107-1.patch
>
>
> The upgrade feature HADOOP-702 requires data-nodes to store persistently the 
> namespaceID 
> in their version files and verify during startup that it matches the one 
> stored on the name-node.
> When the name-node reformats it generates a new namespaceID.
> Now if the cluster starts with the reformatted name-node, and not reformatted 
> data-nodes
> the data-nodes will fail with
> java.io.IOException: Incompatible namespaceIDs ...
> Data-nodes should be reformatted whenever the name-node is. I see 2 
> approaches here:
> 1) In order to reformat the cluster we call "start-dfs -format" or make a 
> special script "format-dfs".
> This would format the cluster components all together. The question is 
> whether it should start
> the cluster after formatting?
> 2) Format the name-node only. When data-nodes connect to the name-node it 
> will tell them to
> format their storage directories if it sees that the namespace is empty and 
> its cTime=0.
> The drawback of this approach is that we can loose blocks of a data-node from 
> another cluster
> if it connects by mistake to the empty name-node.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048837#comment-13048837
 ] 

Todd Lipcon commented on HDFS-941:
--

Hey Stack. I still don't think this is quite right -- it will now call 
verifiedByClient() if the client read the entire byterange, even if the 
byterange didn't cover the whole block. I think we need {{if 
(datanode.blockScanner != null && blockSender.isBlockReadFully())}}. Also, can 
you add back TestDataXceiver? I think that test case would catch this bug.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2046) Force entropy to come from non-true random for tests

2011-06-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2046:
--

   Resolution: Fixed
Fix Version/s: (was: 0.22.0)
   0.23.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk, thanks for review, atm.

> Force entropy to come from non-true random for tests
> 
>
> Key: HDFS-2046
> URL: https://issues.apache.org/jira/browse/HDFS-2046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build, test
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-2046.txt
>
>
> Same as HADOOP-7335 but for HDFS

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2002) Incorrect computation of needed blocks in getTurnOffTip()

2011-06-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048811#comment-13048811
 ] 

Konstantin Shvachko commented on HDFS-2002:
---

Looks like TestSafeMode failure is related to the patch.

> Incorrect computation of needed blocks in getTurnOffTip()
> -
>
> Key: HDFS-2002
> URL: https://issues.apache.org/jira/browse/HDFS-2002
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Matthias Eckert
>  Labels: newbie
> Fix For: 0.22.0
>
> Attachments: hdfs-2002.patch
>
>
> {{SafeModeInfo.getTurnOffTip()}} under-reports the number of blocks needed to 
> reach the safemode threshold.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2066) Create a package and individual class files for DataTransferProtocol

2011-06-13 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048806#comment-13048806
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2066:
--

Hi Xie, thanks for pointing it out.  It was fixed by MAPREDUCE-2588 earlier 
today.

> Create a package and individual class files for DataTransferProtocol
> 
>
> Key: HDFS-2066
> URL: https://issues.apache.org/jira/browse/HDFS-2066
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, hdfs client, name-node
>Affects Versions: 0.23.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0
>
> Attachments: h2066_20110610.patch
>
>
> {{DataTransferProtocol}} contains quite a few classes.  It is better to 
> create a package and put the classes into individual files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-941:
---

Attachment: 941.22.v2.txt

Here's new version that will call verifiedByClient IFF return from client is 
CHECKSUM_OK (Comparing old and new, the isBlockReadFully would look to have a 
lesser semantic than disSendEntireByteRange).  I adde the else if in the below:

{code}
-if (DataTransferProtocol.Status.read(in) == CHECKSUM_OK) {
-  if (blockSender.isBlockReadFully() && datanode.blockScanner != null) 
{
-datanode.blockScanner.verifiedByClient(block);
+  if (blockSender.didSendEntireByteRange()) {
+// If we sent the entire range, then we should expect the client
+// to respond with a Status enum.
+try {
+  DataTransferProtocol.Status stat = 
DataTransferProtocol.Status.read(in);
+  if (stat == null) {
+LOG.warn("Client " + s.getInetAddress() + "did not send a valid 
status " +
+ "code after reading. Will close connection.");
+IOUtils.closeStream(out);
+  } else if (stat == CHECKSUM_OK) {
+if (datanode.blockScanner != null) {
+  datanode.blockScanner.verifiedByClient(block);
+}
   }
{code}

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.

2011-06-13 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1942:
-

Attachment: HDFS-1942-1.patch

Attaching the patch.

> If all Block Pool service threads exit then datanode should exit.
> -
>
> Key: HDFS-1942
> URL: https://issues.apache.org/jira/browse/HDFS-1942
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Attachments: HDFS-1942-1.patch
>
>
> Currently, if all block pool service threads exit, Datanode continue to run. 
> This should be fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048721#comment-13048721
 ] 

stack commented on HDFS-941:


I should put it back.  Give me a sec...



> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048688#comment-13048688
 ] 

Todd Lipcon commented on HDFS-941:
--

Hey Stack. I just looked over your patch for 0.22. The only thing I noticed is 
that it no longer calls "verifiedByClient()" -- this is a change that happened 
in trunk with HDFS-1655. Are we OK with removing this from 0.22?

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1475) Want a -d flag in hadoop dfs -ls : Do not expand directories

2011-06-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048683#comment-13048683
 ] 

Daryn Sharp commented on HDFS-1475:
---

Ok, great!  Then this patch isn't incompatible.  I do also completely agree 
with you about the behavior of ls.  For months I've wanted to change ls to 
behave just like unix ls but held back for fear of an incompatibility 
backlash..

> Want a -d flag in hadoop dfs -ls : Do not expand directories
> 
>
> Key: HDFS-1475
> URL: https://issues.apache.org/jira/browse/HDFS-1475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: any
>Reporter: Greg Connor
>Assignee: Daryn Sharp
>Priority: Minor
> Attachments: HDFS-1475.patch
>
>
> I would really love it if dfs -ls had a -d flag, like unix ls -d, which would 
> list the directories matching the name or pattern but *not* their contents.
> Current behavior is to expand every matching dir and list its contents, which 
> is awkward if I just want to see the matching dirs themselves (and their 
> permissions).  Worse, if a directory exists but is empty, -ls simply returns 
> no output at all, which is unhelpful.  
> So far we have used some ugly workarounds to this in various scripts, such as
>   -ls /path/to |grep dir   # wasteful, and problematic if "dir" is a 
> substring of the path
>   -stat /path/to/dir "Exists"  # stat has no way to get back the full path, 
> sadly
>   -count /path/to/dir  # works but is probably overkill.
> Really there is no reliable replacement for ls -d -- the above hacks will 
> work but only for certain isolated contexts.  (I'm not a java programmer, or 
> else I would probably submit a patch for this, or make my own jar file to do 
> this since I need it a lot.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2072) Remove StringUtils.stringifyException(ie) in logger functions

2011-06-13 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048670#comment-13048670
 ] 

Harsh J commented on HDFS-2072:
---

This issue duplicates HDFS-1977

(Which ought to be closed in such cases?)

> Remove StringUtils.stringifyException(ie) in logger functions
> -
>
> Key: HDFS-2072
> URL: https://issues.apache.org/jira/browse/HDFS-2072
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.23.0
>
>
> Apache logger api has an overloaded function which can take the message and 
> exception. I am proposing to clean the logging code with this api.
> ie.:
> Change the code from LOG.warn(msg, 
> StringUtils.stringifyException(exception)); to LOG.warn(msg, exception);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2072) Remove StringUtils.stringifyException(ie) in logger functions

2011-06-13 Thread Bharath Mundlapudi (JIRA)
Remove StringUtils.stringifyException(ie) in logger functions
-

 Key: HDFS-2072
 URL: https://issues.apache.org/jira/browse/HDFS-2072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


Apache logger api has an overloaded function which can take the message and 
exception. I am proposing to clean the logging code with this api.

ie.:
Change the code from LOG.warn(msg, StringUtils.stringifyException(exception)); 
to LOG.warn(msg, exception);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1475) Want a -d flag in hadoop dfs -ls : Do not expand directories

2011-06-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048660#comment-13048660
 ] 

Allen Wittenauer commented on HDFS-1475:


My read of this JIRA in combination with HADOOP-7378 seemed to hint at a change 
in -ls's default behavior.  Thus why I said "suspected" above.  If that isn't 
the case, then we're good to go.  If it does change it, we're still good to 
go*... we just need to flag one of these as an incompatible change and throw a 
release note in there so folks aren't surprised.

* - despite what some might think, I've actually wanted more things to break 
inbetween releases than what most of the other devs have wanted.  I'd rather 
break things now, even in really horrible ways, before we hit 1.0 than after.  
This includes -ls's default behavior, which I think is... less than useful.

> Want a -d flag in hadoop dfs -ls : Do not expand directories
> 
>
> Key: HDFS-1475
> URL: https://issues.apache.org/jira/browse/HDFS-1475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: any
>Reporter: Greg Connor
>Assignee: Daryn Sharp
>Priority: Minor
> Attachments: HDFS-1475.patch
>
>
> I would really love it if dfs -ls had a -d flag, like unix ls -d, which would 
> list the directories matching the name or pattern but *not* their contents.
> Current behavior is to expand every matching dir and list its contents, which 
> is awkward if I just want to see the matching dirs themselves (and their 
> permissions).  Worse, if a directory exists but is empty, -ls simply returns 
> no output at all, which is unhelpful.  
> So far we have used some ugly workarounds to this in various scripts, such as
>   -ls /path/to |grep dir   # wasteful, and problematic if "dir" is a 
> substring of the path
>   -stat /path/to/dir "Exists"  # stat has no way to get back the full path, 
> sadly
>   -count /path/to/dir  # works but is probably overkill.
> Really there is no reliable replacement for ls -d -- the above hacks will 
> work but only for certain isolated contexts.  (I'm not a java programmer, or 
> else I would probably submit a patch for this, or make my own jar file to do 
> this since I need it a lot.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1475) Want a -d flag in hadoop dfs -ls : Do not expand directories

2011-06-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048645#comment-13048645
 ] 

Daryn Sharp commented on HDFS-1475:
---

I understand! I posted broken english on a jira before I had my coffee. :)

As for the patch, I'm puzzled by the seemingly contradictory statement that 
adding -d isn't incompatible, but changing the behavior of ls is incompatible.  
Perhaps I'm misunderstanding?  The current output and behavior hasn't been 
changed at all.  I think the only dubious scenario where breakage would occur 
is if a script currently invokes "-ls -d ..." and expects it to fail.  Am I 
missing something?

> Want a -d flag in hadoop dfs -ls : Do not expand directories
> 
>
> Key: HDFS-1475
> URL: https://issues.apache.org/jira/browse/HDFS-1475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: any
>Reporter: Greg Connor
>Assignee: Daryn Sharp
>Priority: Minor
> Attachments: HDFS-1475.patch
>
>
> I would really love it if dfs -ls had a -d flag, like unix ls -d, which would 
> list the directories matching the name or pattern but *not* their contents.
> Current behavior is to expand every matching dir and list its contents, which 
> is awkward if I just want to see the matching dirs themselves (and their 
> permissions).  Worse, if a directory exists but is empty, -ls simply returns 
> no output at all, which is unhelpful.  
> So far we have used some ugly workarounds to this in various scripts, such as
>   -ls /path/to |grep dir   # wasteful, and problematic if "dir" is a 
> substring of the path
>   -stat /path/to/dir "Exists"  # stat has no way to get back the full path, 
> sadly
>   -count /path/to/dir  # works but is probably overkill.
> Really there is no reliable replacement for ls -d -- the above hacks will 
> work but only for certain isolated contexts.  (I'm not a java programmer, or 
> else I would probably submit a patch for this, or make my own jar file to do 
> this since I need it a lot.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1739) When DataNode throws DiskOutOfSpaceException, it will be helpfull to the user if we log the available volume size and configured block size.

2011-06-13 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048635#comment-13048635
 ] 

Uma Maheswara Rao G commented on HDFS-1739:
---

Patch updated.
Sorry for the late response.

> When DataNode throws DiskOutOfSpaceException, it will be helpfull to the user 
> if we log the available volume size and configured block size.
> 
>
> Key: HDFS-1739
> URL: https://issues.apache.org/jira/browse/HDFS-1739
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Minor
> Attachments: HDFS-1739.1.patch, HDFS-1739.2.patch, HDFS-1739.3.patch, 
> HDFS-1739.4.patch, HDFS-1739.5.patch, HDFS-1739.patch
>
>
> DataNode will throw DiskOutOfSpaceException for new blcok write if available 
> volume size is less than configured blcok size.
>  So, it will be helpfull to the user if we log this details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1475) Want a -d flag in hadoop dfs -ls : Do not expand directories

2011-06-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048628#comment-13048628
 ] 

Allen Wittenauer commented on HDFS-1475:


Changing the output of an existing command will definitely break people.  So -d 
support wouldn't be, but changing the behavior of -l definitely is.

> Want a -d flag in hadoop dfs -ls : Do not expand directories
> 
>
> Key: HDFS-1475
> URL: https://issues.apache.org/jira/browse/HDFS-1475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: any
>Reporter: Greg Connor
>Assignee: Daryn Sharp
>Priority: Minor
> Attachments: HDFS-1475.patch
>
>
> I would really love it if dfs -ls had a -d flag, like unix ls -d, which would 
> list the directories matching the name or pattern but *not* their contents.
> Current behavior is to expand every matching dir and list its contents, which 
> is awkward if I just want to see the matching dirs themselves (and their 
> permissions).  Worse, if a directory exists but is empty, -ls simply returns 
> no output at all, which is unhelpful.  
> So far we have used some ugly workarounds to this in various scripts, such as
>   -ls /path/to |grep dir   # wasteful, and problematic if "dir" is a 
> substring of the path
>   -stat /path/to/dir "Exists"  # stat has no way to get back the full path, 
> sadly
>   -count /path/to/dir  # works but is probably overkill.
> Really there is no reliable replacement for ls -d -- the above hacks will 
> work but only for certain isolated contexts.  (I'm not a java programmer, or 
> else I would probably submit a patch for this, or make my own jar file to do 
> this since I need it a lot.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1475) Want a -d flag in hadoop dfs -ls : Do not expand directories

2011-06-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048630#comment-13048630
 ] 

Allen Wittenauer commented on HDFS-1475:


grr. -ls.  *needs sleep and edit mode*

> Want a -d flag in hadoop dfs -ls : Do not expand directories
> 
>
> Key: HDFS-1475
> URL: https://issues.apache.org/jira/browse/HDFS-1475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: any
>Reporter: Greg Connor
>Assignee: Daryn Sharp
>Priority: Minor
> Attachments: HDFS-1475.patch
>
>
> I would really love it if dfs -ls had a -d flag, like unix ls -d, which would 
> list the directories matching the name or pattern but *not* their contents.
> Current behavior is to expand every matching dir and list its contents, which 
> is awkward if I just want to see the matching dirs themselves (and their 
> permissions).  Worse, if a directory exists but is empty, -ls simply returns 
> no output at all, which is unhelpful.  
> So far we have used some ugly workarounds to this in various scripts, such as
>   -ls /path/to |grep dir   # wasteful, and problematic if "dir" is a 
> substring of the path
>   -stat /path/to/dir "Exists"  # stat has no way to get back the full path, 
> sadly
>   -count /path/to/dir  # works but is probably overkill.
> Really there is no reliable replacement for ls -d -- the above hacks will 
> work but only for certain isolated contexts.  (I'm not a java programmer, or 
> else I would probably submit a patch for this, or make my own jar file to do 
> this since I need it a lot.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1475) Want a -d flag in hadoop dfs -ls : Do not expand directories

2011-06-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048629#comment-13048629
 ] 

Allen Wittenauer commented on HDFS-1475:


err, changing -l.

> Want a -d flag in hadoop dfs -ls : Do not expand directories
> 
>
> Key: HDFS-1475
> URL: https://issues.apache.org/jira/browse/HDFS-1475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: any
>Reporter: Greg Connor
>Assignee: Daryn Sharp
>Priority: Minor
> Attachments: HDFS-1475.patch
>
>
> I would really love it if dfs -ls had a -d flag, like unix ls -d, which would 
> list the directories matching the name or pattern but *not* their contents.
> Current behavior is to expand every matching dir and list its contents, which 
> is awkward if I just want to see the matching dirs themselves (and their 
> permissions).  Worse, if a directory exists but is empty, -ls simply returns 
> no output at all, which is unhelpful.  
> So far we have used some ugly workarounds to this in various scripts, such as
>   -ls /path/to |grep dir   # wasteful, and problematic if "dir" is a 
> substring of the path
>   -stat /path/to/dir "Exists"  # stat has no way to get back the full path, 
> sadly
>   -count /path/to/dir  # works but is probably overkill.
> Really there is no reliable replacement for ls -d -- the above hacks will 
> work but only for certain isolated contexts.  (I'm not a java programmer, or 
> else I would probably submit a patch for this, or make my own jar file to do 
> this since I need it a lot.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1475) Want a -d flag in hadoop dfs -ls : Do not expand directories

2011-06-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048619#comment-13048619
 ] 

Daryn Sharp commented on HDFS-1475:
---

I'm still a bit confused.  Are enhancements deemed incompatible?

> Want a -d flag in hadoop dfs -ls : Do not expand directories
> 
>
> Key: HDFS-1475
> URL: https://issues.apache.org/jira/browse/HDFS-1475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: any
>Reporter: Greg Connor
>Assignee: Daryn Sharp
>Priority: Minor
> Attachments: HDFS-1475.patch
>
>
> I would really love it if dfs -ls had a -d flag, like unix ls -d, which would 
> list the directories matching the name or pattern but *not* their contents.
> Current behavior is to expand every matching dir and list its contents, which 
> is awkward if I just want to see the matching dirs themselves (and their 
> permissions).  Worse, if a directory exists but is empty, -ls simply returns 
> no output at all, which is unhelpful.  
> So far we have used some ugly workarounds to this in various scripts, such as
>   -ls /path/to |grep dir   # wasteful, and problematic if "dir" is a 
> substring of the path
>   -stat /path/to/dir "Exists"  # stat has no way to get back the full path, 
> sadly
>   -count /path/to/dir  # works but is probably overkill.
> Really there is no reliable replacement for ls -d -- the above hacks will 
> work but only for certain isolated contexts.  (I'm not a java programmer, or 
> else I would probably submit a patch for this, or make my own jar file to do 
> this since I need it a lot.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1475) Want a -d flag in hadoop dfs -ls : Do not expand directories

2011-06-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048607#comment-13048607
 ] 

Allen Wittenauer commented on HDFS-1475:


Woops, I actually meant that comment for HADOOP-7378.  

> Want a -d flag in hadoop dfs -ls : Do not expand directories
> 
>
> Key: HDFS-1475
> URL: https://issues.apache.org/jira/browse/HDFS-1475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: any
>Reporter: Greg Connor
>Assignee: Daryn Sharp
>Priority: Minor
> Attachments: HDFS-1475.patch
>
>
> I would really love it if dfs -ls had a -d flag, like unix ls -d, which would 
> list the directories matching the name or pattern but *not* their contents.
> Current behavior is to expand every matching dir and list its contents, which 
> is awkward if I just want to see the matching dirs themselves (and their 
> permissions).  Worse, if a directory exists but is empty, -ls simply returns 
> no output at all, which is unhelpful.  
> So far we have used some ugly workarounds to this in various scripts, such as
>   -ls /path/to |grep dir   # wasteful, and problematic if "dir" is a 
> substring of the path
>   -stat /path/to/dir "Exists"  # stat has no way to get back the full path, 
> sadly
>   -count /path/to/dir  # works but is probably overkill.
> Really there is no reliable replacement for ls -d -- the above hacks will 
> work but only for certain isolated contexts.  (I'm not a java programmer, or 
> else I would probably submit a patch for this, or make my own jar file to do 
> this since I need it a lot.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048579#comment-13048579
 ] 

Kihwal Lee commented on HDFS-941:
-

HDFS-2071 was filed. 

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, 
> HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, 
> HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-13 Thread Kihwal Lee (JIRA)
Use of isConnected() in DataXceiver is invalid
--

 Key: HDFS-2071
 URL: https://issues.apache.org/jira/browse/HDFS-2071
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0
Reporter: Kihwal Lee
Priority: Minor


The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
false until the connection is made and then always returns true after that. It 
will never return false after the initial connection is successfully made. 
Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
someone is handling SocketException and does Socket.close() or 
SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
using IOUtils.closeStream(), which will invoke SocketChannel.close().

{code}
- } while (s.isConnected() && socketKeepaliveTimeout > 0);
+ } while (!s.isClosed() && socketKeepaliveTimeout > 0);
{code}

The effect of this bug is very minor, as the socket is read again right after. 
If the connection was closed, the readOp() will throw an EOFException, which is 
caught and dealt with properly.  The system still functions normally with 
probably only few microseconds of extra overhead in the premature connection 
closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1739) When DataNode throws DiskOutOfSpaceException, it will be helpfull to the user if we log the available volume size and configured block size.

2011-06-13 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-1739:
--

Attachment: HDFS-1739.5.patch

> When DataNode throws DiskOutOfSpaceException, it will be helpfull to the user 
> if we log the available volume size and configured block size.
> 
>
> Key: HDFS-1739
> URL: https://issues.apache.org/jira/browse/HDFS-1739
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Minor
> Attachments: HDFS-1739.1.patch, HDFS-1739.2.patch, HDFS-1739.3.patch, 
> HDFS-1739.4.patch, HDFS-1739.5.patch, HDFS-1739.patch
>
>
> DataNode will throw DiskOutOfSpaceException for new blcok write if available 
> volume size is less than configured blcok size.
>  So, it will be helpfull to the user if we log this details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1734) 'Chunk size to view' option is not working in Name Node UI.

2011-06-13 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048569#comment-13048569
 ] 

Uma Maheswara Rao G commented on HDFS-1734:
---

Thanks Jitendra for Review.
 Updated the patch against to latest trunk.

> 'Chunk size to view' option is not working in Name Node UI.
> ---
>
> Key: HDFS-1734
> URL: https://issues.apache.org/jira/browse/HDFS-1734
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: ChunkSizeToView.jpg, HDFS-1734.1.patch, 
> HDFS-1734.2.patch, HDFS-1734.patch
>
>
>   1. Write a file to DFS
>   2. Browse the file using Name Node UI.
>   3. give the chunk size to view as 100 and click the refresh.
>   It will say Invalid input ( getnstamp absent )

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1734) 'Chunk size to view' option is not working in Name Node UI.

2011-06-13 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-1734:
--

Attachment: HDFS-1734.2.patch

> 'Chunk size to view' option is not working in Name Node UI.
> ---
>
> Key: HDFS-1734
> URL: https://issues.apache.org/jira/browse/HDFS-1734
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: ChunkSizeToView.jpg, HDFS-1734.1.patch, 
> HDFS-1734.2.patch, HDFS-1734.patch
>
>
>   1. Write a file to DFS
>   2. Browse the file using Name Node UI.
>   3. give the chunk size to view as 100 and click the refresh.
>   It will say Invalid input ( getnstamp absent )

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1475) Want a -d flag in hadoop dfs -ls : Do not expand directories

2011-06-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048563#comment-13048563
 ] 

Daryn Sharp commented on HDFS-1475:
---

I have no problem marking it incompatible, but I guess I'm unclear on what 
constitutes an incompatible change.  The patch doesn't alter any of the 
pre-existing behavior -- it just adds another option.  Are enhancements 
considered incompatible?

BTW, the test failed because the corresponding hadoop jira is not integrated.

> Want a -d flag in hadoop dfs -ls : Do not expand directories
> 
>
> Key: HDFS-1475
> URL: https://issues.apache.org/jira/browse/HDFS-1475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: any
>Reporter: Greg Connor
>Assignee: Daryn Sharp
>Priority: Minor
> Attachments: HDFS-1475.patch
>
>
> I would really love it if dfs -ls had a -d flag, like unix ls -d, which would 
> list the directories matching the name or pattern but *not* their contents.
> Current behavior is to expand every matching dir and list its contents, which 
> is awkward if I just want to see the matching dirs themselves (and their 
> permissions).  Worse, if a directory exists but is empty, -ls simply returns 
> no output at all, which is unhelpful.  
> So far we have used some ugly workarounds to this in various scripts, such as
>   -ls /path/to |grep dir   # wasteful, and problematic if "dir" is a 
> substring of the path
>   -stat /path/to/dir "Exists"  # stat has no way to get back the full path, 
> sadly
>   -count /path/to/dir  # works but is probably overkill.
> Really there is no reliable replacement for ls -d -- the above hacks will 
> work but only for certain isolated contexts.  (I'm not a java programmer, or 
> else I would probably submit a patch for this, or make my own jar file to do 
> this since I need it a lot.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2066) Create a package and individual class files for DataTransferProtocol

2011-06-13 Thread XieXianshan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048523#comment-13048523
 ] 

XieXianshan commented on HDFS-2066:
---

A little question.
Why did this patch not contain the following java files?
 - mapreduce/src/contrib/raid/src/java/org/apache/hadoop/raid/BlockFixer.java
 - 
mapreduce/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java

The class DataTransferProtocol has also been imported in these files.

> Create a package and individual class files for DataTransferProtocol
> 
>
> Key: HDFS-2066
> URL: https://issues.apache.org/jira/browse/HDFS-2066
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, hdfs client, name-node
>Affects Versions: 0.23.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0
>
> Attachments: h2066_20110610.patch
>
>
> {{DataTransferProtocol}} contains quite a few classes.  It is better to 
> create a package and put the classes into individual files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2028) There are lot of ESTABLISHED socket connetions at 50010 port.

2011-06-13 Thread Zhou Sheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048500#comment-13048500
 ] 

Zhou Sheng commented on HDFS-2028:
--

I checkout version 0.20.205 from svn 
(http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-205/)
 and built the new core jar.
After replacing the core jar, the problem still retains. When I load data into 
tables of hive via hiver cli or hive server, there are always some ESTABLISHED 
status socket connections.


> There are lot of ESTABLISHED socket connetions at 50010 port.
> -
>
> Key: HDFS-2028
> URL: https://issues.apache.org/jira/browse/HDFS-2028
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.1, 0.20.3
> Environment: linux server
>Reporter: Zhou Sheng
>  Labels: hadoop
>
> When I load data into tables of hive via hiver cli or hive server, there are 
> always some ESTABLISHED or CLOSE_WAIT status socket connections. And these 
> tcp connections won't be released unless you quit hive or restart hiveserver.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira