[jira] Commented: (HDFS-1606) Provide a stronger data guarantee in the write pipeline

2011-02-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992909#comment-12992909
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1606:
--

> 1. Find a datanode D by some means.

I have checked the code.  This is easier than I expect since 
{{BlockPlacementPolicy}} is able to find an additional datanode, given a list 
of chosen datanodes.  The remaining work of this part is to add a new method to 
{{ClientProtocol}} so that {{DFSClient}} could use it.

> Provide a stronger data guarantee in the write pipeline
> ---
>
> Key: HDFS-1606
> URL: https://issues.apache.org/jira/browse/HDFS-1606
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> In the current design, if there is a datanode/network failure in the write 
> pipeline, DFSClient will try to remove the failed datanode from the pipeline 
> and then continue writing with the remaining datanodes.  As a result, the 
> number of datanodes in the pipeline is decreased.  Unfortunately, it is 
> possible that DFSClient may incorrectly remove a healthy datanode but leave 
> the failed datanode in the pipeline because failure detection may be 
> inaccurate under erroneous conditions.
> We propose to have a new mechanism for adding new datanodes to the pipeline 
> in order to provide a stronger data guarantee.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-09 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992907#comment-12992907
 ] 

dhruba borthakur commented on HDFS-347:
---

Thanks Ryan for merging that patch to the head of 0.20-append branch. Please do 
let me know if you see any problems with it.

I agree with Allen/Todd that since Todd's patch is an optimization, we can get 
it committed even if this optimization does not work on non-linux platforms. 
Can some security guru review the security aspects of it?

> DFS read performance suboptimal when client co-located on nodes with data
> -
>
> Key: HDFS-347
> URL: https://issues.apache.org/jira/browse/HDFS-347
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: George Porter
>Assignee: Todd Lipcon
> Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, 
> HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-branch-20-append.txt, 
> all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1618) configure files that are generated as part of the released tarball need to have executable bit set

2011-02-09 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992877#comment-12992877
 ] 

Konstantin Boudnik commented on HDFS-1618:
--

+1 patch looks good. Let's run it through usual validation cycle.

> configure files that are generated as part of the released tarball need to 
> have executable bit set 
> ---
>
> Key: HDFS-1618
> URL: https://issues.apache.org/jira/browse/HDFS-1618
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
> Attachments: HDFS-1618.patch
>
>
> Currently the configure files that are packaged in a tarball are -rw-rw-r--

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1619) Does libhdfs really need to depend on AC_TYPE_INT16_T, AC_TYPE_INT32_T, AC_TYPE_INT64_T and AC_TYPE_UINT16_T ?

2011-02-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992872#comment-12992872
 ] 

Allen Wittenauer commented on HDFS-1619:


Given that CentOS/RHEL 5.5 doesn't ship with a working Java, I don't see the 
issue with requiring a newer autoconf toolset.

> Does libhdfs really need to depend on AC_TYPE_INT16_T, AC_TYPE_INT32_T, 
> AC_TYPE_INT64_T and AC_TYPE_UINT16_T ?
> --
>
> Key: HDFS-1619
> URL: https://issues.apache.org/jira/browse/HDFS-1619
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Roman Shaposhnik
>Assignee: Konstantin Shvachko
>
> Currently configure.ac uses AC_TYPE_INT16_T, AC_TYPE_INT32_T, AC_TYPE_INT64_T 
> and AC_TYPE_UINT16_T and thus requires autoconf 2.61 or higher. 
> This prevents using it on such platforms as CentOS/RHEL 5.4 and 5.5. Given 
> that those are pretty popular and also given that it is really difficult to 
> find a platform
> these days that doesn't natively define  intXX_t types I'm curious as to 
> whether we can simply remove those macros or perhaps fail ONLY if we happen 
> to be on such
> a platform. 
> Here's a link to GNU autoconf docs for your reference:
> http://www.gnu.org/software/hello/manual/autoconf/Particular-Types.html

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HDFS-1619) Does libhdfs really need to depend on AC_TYPE_INT16_T, AC_TYPE_INT32_T, AC_TYPE_INT64_T and AC_TYPE_UINT16_T ?

2011-02-09 Thread Roman Shaposhnik (JIRA)
Does libhdfs really need to depend on AC_TYPE_INT16_T, AC_TYPE_INT32_T, 
AC_TYPE_INT64_T and AC_TYPE_UINT16_T ?
--

 Key: HDFS-1619
 URL: https://issues.apache.org/jira/browse/HDFS-1619
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Roman Shaposhnik
Assignee: Konstantin Shvachko


Currently configure.ac uses AC_TYPE_INT16_T, AC_TYPE_INT32_T, AC_TYPE_INT64_T 
and AC_TYPE_UINT16_T and thus requires autoconf 2.61 or higher. 
This prevents using it on such platforms as CentOS/RHEL 5.4 and 5.5. Given that 
those are pretty popular and also given that it is really difficult to find a 
platform
these days that doesn't natively define  intXX_t types I'm curious as to 
whether we can simply remove those macros or perhaps fail ONLY if we happen to 
be on such
a platform. 

Here's a link to GNU autoconf docs for your reference:
http://www.gnu.org/software/hello/manual/autoconf/Particular-Types.html

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1618) configure files that are generated as part of the released tarball need to have executable bit set

2011-02-09 Thread Roman Shaposhnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated HDFS-1618:
---

Attachment: HDFS-1618.patch

Patch attached

> configure files that are generated as part of the released tarball need to 
> have executable bit set 
> ---
>
> Key: HDFS-1618
> URL: https://issues.apache.org/jira/browse/HDFS-1618
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
> Attachments: HDFS-1618.patch
>
>
> Currently the configure files that are packaged in a tarball are -rw-rw-r--

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-09 Thread ryan rawson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated HDFS-347:
-

Attachment: HDFS-347-branch-20-append.txt

applies to head of branch-20-append

> DFS read performance suboptimal when client co-located on nodes with data
> -
>
> Key: HDFS-347
> URL: https://issues.apache.org/jira/browse/HDFS-347
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: George Porter
>Assignee: Todd Lipcon
> Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, 
> HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-branch-20-append.txt, 
> all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-09 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992801#comment-12992801
 ] 

ryan rawson commented on HDFS-347:
--

ok this was my bad, i applied the patch wrong. unit test passes. I'll attach a 
patch for others

> DFS read performance suboptimal when client co-located on nodes with data
> -
>
> Key: HDFS-347
> URL: https://issues.apache.org/jira/browse/HDFS-347
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: George Porter
>Assignee: Todd Lipcon
> Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, 
> HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png, 
> hdfs-347.txt, local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-09 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992786#comment-12992786
 ] 

ryan rawson commented on HDFS-347:
--

Applying this patch to branch-20-append and the unit test passes. Still trying 
to figure out why it works on one thing and not on the other. The patch is 
pretty dang simple too.

> DFS read performance suboptimal when client co-located on nodes with data
> -
>
> Key: HDFS-347
> URL: https://issues.apache.org/jira/browse/HDFS-347
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: George Porter
>Assignee: Todd Lipcon
> Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, 
> HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png, 
> hdfs-347.txt, local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HDFS-1618) configure files that are generated as part of the released tarball need to have executable bit set

2011-02-09 Thread Roman Shaposhnik (JIRA)
configure files that are generated as part of the released tarball need to have 
executable bit set 
---

 Key: HDFS-1618
 URL: https://issues.apache.org/jira/browse/HDFS-1618
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik


Currently the configure files that are packaged in a tarball are -rw-rw-r--

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-09 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992766#comment-12992766
 ] 

ryan rawson commented on HDFS-347:
--

dhruba, I am not seeing the file 
src/hdfs/org/apache/hadoop/hdfs/metrics/DFSClientMetrics.java in 
branch-20-append (nor cdh3b2).  I also got a number of rejects, here are some 
highlights:

ClientDatanodeProtocol, your variant has copyBlock, ours does not (hence the 
rej).
Misc field differences in DFSClient, including the metrics object

After resolving them I was able to get it up and going.

I'm not able to get the unit test to pass, I'm guessing it's this:
2011-02-09 14:35:49,926 DEBUG hdfs.DFSClient 
(DFSClient.java:fetchBlockByteRange(1927)) - fetchBlockByteRange 
shortCircuitLocalReads true localhst h132.sfo.stumble.net/10.10.1.132 
targetAddr /127.0.0.1:62665

Since we don't recognize that we are 'local', we do the normal read path which 
is failing. Any tips?

> DFS read performance suboptimal when client co-located on nodes with data
> -
>
> Key: HDFS-347
> URL: https://issues.apache.org/jira/browse/HDFS-347
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: George Porter
>Assignee: Todd Lipcon
> Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, 
> HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png, 
> hdfs-347.txt, local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-02-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992749#comment-12992749
 ] 

Todd Lipcon commented on HDFS-1602:
---

+1 on patch for 22

> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.23.0
>Reporter: Konstantin Boudnik
>Assignee: Boris Shkolnik
> Attachments: HDFS-1602-1.patch, HDFS-1602.patch, HDFS-1602v22.patch
>
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992740#comment-12992740
 ] 

Hadoop QA commented on HDFS-1602:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12470731/HDFS-1602v22.patch
  against trunk revision 1068968.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/158//console

This message is automatically generated.

> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.23.0
>Reporter: Konstantin Boudnik
>Assignee: Boris Shkolnik
> Attachments: HDFS-1602-1.patch, HDFS-1602.patch, HDFS-1602v22.patch
>
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-02-09 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-1602:
-

Attachment: HDFS-1602v22.patch

Patch for 0.22

> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.23.0
>Reporter: Konstantin Boudnik
>Assignee: Boris Shkolnik
> Attachments: HDFS-1602-1.patch, HDFS-1602.patch, HDFS-1602v22.patch
>
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1603) Namenode gets sticky if one of namenode storage volumes disappears (removed, unmounted, etc.)

2011-02-09 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992729#comment-12992729
 ] 

Konstantin Boudnik commented on HDFS-1603:
--

It clearly depends on NFS mount's timeout.

> Namenode gets sticky if one of namenode storage volumes disappears (removed, 
> unmounted, etc.)
> -
>
> Key: HDFS-1603
> URL: https://issues.apache.org/jira/browse/HDFS-1603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Konstantin Boudnik
>
> While investigating failures on HDFS-1602 it became apparent that once a 
> namenode storage volume is pulled out NN becomes completely "sticky" until 
> {{FSImage:processIOError: removing storage}} move the storage from the active 
> set. During this time none of normal NN operations are possible (e.g. 
> creating a directory on HDFS timeouts eventually).
> In case of NFS this can be workaround'd with soft,intr,timeo,retrans 
> settings. However, a better handling of the situation is apparently possible 
> and needs to be implemented.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (HDFS-1617) CLONE to COMMON - Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

2011-02-09 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley resolved HDFS-1617.
--

  Resolution: Fixed
Release Note:   (was: Batch hardlinking during "upgrade" snapshots, cutting 
time from aprx 8 minutes per volume to aprx 8 seconds.  Validated in both Linux 
and Windows.  Requires coordinated change in both COMMON and HDFS.)

no change.  need to open under COMMON.

> CLONE to COMMON - Batch the calls in DataStorage to 
> FileUtil.createHardLink(), so we call it once per directory instead of once 
> per file
> 
>
> Key: HDFS-1617
> URL: https://issues.apache.org/jira/browse/HDFS-1617
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20.2
>Reporter: Matt Foley
>Assignee: Matt Foley
> Fix For: 0.22.0
>
>
> It was a bit of a puzzle why we can do a full scan of a disk in about 30 
> seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes 
> to do Upgrade replication via hardlinks.  It turns out that the 
> org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to 
> Runtime.getRuntime().exec(), to utilize native filesystem hardlink 
> capability.  So it is forking a full-weight external process, and we call it 
> on each individual file to be replicated.
> As a simple check on the possible cost of this approach, I built a Perl test 
> script (under Linux on a production-class datanode).  Perl also uses a 
> compiled and optimized p-code engine, and it has both native support for 
> hardlinks and the ability to do "exec".  
> -  A simple script to create 256,000 files in a directory tree organized like 
> the Datanode, took 10 seconds to run.
> -  Replicating that directory tree using hardlinks, the same way as the 
> Datanode, took 12 seconds using native hardlink support.
> -  The same replication using outcalls to exec, one per file, took 256 
> seconds!
> -  Batching the calls, and doing 'exec' once per directory instead of once 
> per file, took 16 seconds.
> Obviously, your mileage will vary based on the number of blocks per volume.  
> A volume with less than about 4000 blocks will have only 65 directories.  A 
> volume with more than 4K and less than about 250K blocks will have 4200 
> directories (more or less).  And there are two files per block (the data file 
> and the .meta file).  So the average number of files per directory may vary 
> from 2:1 to 500:1.  A node with 50K blocks and four volumes will have 25K 
> files per volume, or an average of about 6:1.  So this change may be expected 
> to take it down from, say, 12 minutes per volume to 2.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-270) DFS Upgrade should process dfs.data.dirs in parallel

2011-02-09 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992645#comment-12992645
 ] 

Hairong Kuang commented on HDFS-270:


Matt, thanks so much for sharing the patch to HDFS-1445. I will review it. 
Cutting the time from 8 min to 8 sec is so impressive! Job well done!

> DFS Upgrade should process dfs.data.dirs in parallel
> 
>
> Key: HDFS-270
> URL: https://issues.apache.org/jira/browse/HDFS-270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20.2
>Reporter: Stu Hood
>Assignee: Hairong Kuang
>
> I just upgraded from 0.14.2 to 0.15.0, and things went very smoothly, if a 
> little slowly.
> The main reason the upgrade took so long was the block upgrades on the 
> datanodes. Each of our datanodes has 3 drives listed for the dfs.data.dir 
> parameter. From looking at the logs, it is fairly clear that the upgrade 
> procedure does not attempt to upgrade all listed dfs.data.dir's in parallel.
> I think even if all of your dfs.data.dir's are on the same physical device, 
> there would still be an advantage to performing the upgrade process in 
> parallel. The less downtime, the better: especially if it is potentially 20 
> minutes versus 60 minutes.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HDFS-1617) CLONE to COMMON - Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

2011-02-09 Thread Matt Foley (JIRA)
CLONE to COMMON - Batch the calls in DataStorage to FileUtil.createHardLink(), 
so we call it once per directory instead of once per file


 Key: HDFS-1617
 URL: https://issues.apache.org/jira/browse/HDFS-1617
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Affects Versions: 0.20.2
Reporter: Matt Foley
Assignee: Matt Foley
 Fix For: 0.22.0


It was a bit of a puzzle why we can do a full scan of a disk in about 30 
seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes to 
do Upgrade replication via hardlinks.  It turns out that the 
org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to 
Runtime.getRuntime().exec(), to utilize native filesystem hardlink capability.  
So it is forking a full-weight external process, and we call it on each 
individual file to be replicated.

As a simple check on the possible cost of this approach, I built a Perl test 
script (under Linux on a production-class datanode).  Perl also uses a compiled 
and optimized p-code engine, and it has both native support for hardlinks and 
the ability to do "exec".  
-  A simple script to create 256,000 files in a directory tree organized like 
the Datanode, took 10 seconds to run.
-  Replicating that directory tree using hardlinks, the same way as the 
Datanode, took 12 seconds using native hardlink support.
-  The same replication using outcalls to exec, one per file, took 256 seconds!
-  Batching the calls, and doing 'exec' once per directory instead of once per 
file, took 16 seconds.

Obviously, your mileage will vary based on the number of blocks per volume.  A 
volume with less than about 4000 blocks will have only 65 directories.  A 
volume with more than 4K and less than about 250K blocks will have 4200 
directories (more or less).  And there are two files per block (the data file 
and the .meta file).  So the average number of files per directory may vary 
from 2:1 to 500:1.  A node with 50K blocks and four volumes will have 25K files 
per volume, or an average of about 6:1.  So this change may be expected to take 
it down from, say, 12 minutes per volume to 2.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

2011-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992642#comment-12992642
 ] 

Hadoop QA commented on HDFS-1445:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12470696/HDFS-1445-trunk.v22_hdfs_2-of-2.patch
  against trunk revision 1068968.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:


-1 contrib tests.  The patch failed contrib unit tests.

-1 system test framework.  The patch failed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/157//testReport/
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/157//console

This message is automatically generated.

> Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it 
> once per directory instead of once per file
> --
>
> Key: HDFS-1445
> URL: https://issues.apache.org/jira/browse/HDFS-1445
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20.2
>Reporter: Matt Foley
>Assignee: Matt Foley
> Fix For: 0.22.0
>
> Attachments: HDFS-1445-trunk.v22_common_1-of-2.patch, 
> HDFS-1445-trunk.v22_hdfs_2-of-2.patch
>
>
> It was a bit of a puzzle why we can do a full scan of a disk in about 30 
> seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes 
> to do Upgrade replication via hardlinks.  It turns out that the 
> org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to 
> Runtime.getRuntime().exec(), to utilize native filesystem hardlink 
> capability.  So it is forking a full-weight external process, and we call it 
> on each individual file to be replicated.
> As a simple check on the possible cost of this approach, I built a Perl test 
> script (under Linux on a production-class datanode).  Perl also uses a 
> compiled and optimized p-code engine, and it has both native support for 
> hardlinks and the ability to do "exec".  
> -  A simple script to create 256,000 files in a directory tree organized like 
> the Datanode, took 10 seconds to run.
> -  Replicating that directory tree using hardlinks, the same way as the 
> Datanode, took 12 seconds using native hardlink support.
> -  The same replication using outcalls to exec, one per file, took 256 
> seconds!
> -  Batching the calls, and doing 'exec' once per directory instead of once 
> per file, took 16 seconds.
> Obviously, your mileage will vary based on the number of blocks per volume.  
> A volume with less than about 4000 blocks will have only 65 directories.  A 
> volume with more than 4K and less than about 250K blocks will have 4200 
> directories (more or less).  And there are two files per block (the data file 
> and the .meta file).  So the average number of files per directory may vary 
> from 2:1 to 500:1.  A node with 50K blocks and four volumes will have 25K 
> files per volume, or an average of about 6:1.  So this change may be expected 
> to take it down from, say, 12 minutes per volume to 2.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-270) DFS Upgrade should process dfs.data.dirs in parallel

2011-02-09 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992639#comment-12992639
 ] 

Matt Foley commented on HDFS-270:
-

Hello Hairong, sorry I didn't see and respond to your request timely.
You are of course welcome to take this on.

However, please first take the patch for HDFS-1445, which I have now uploaded 
to that JIRA.  It cuts the per-volume upgrade time from aprx 8 minutes to aprx 
8 seconds, for my timings of a 12,500-block (25,000-file) volume.  Even 12 
volumes won't take very long to upgrade at that rate.
Regards, --Matt

> DFS Upgrade should process dfs.data.dirs in parallel
> 
>
> Key: HDFS-270
> URL: https://issues.apache.org/jira/browse/HDFS-270
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20.2
>Reporter: Stu Hood
>Assignee: Hairong Kuang
>
> I just upgraded from 0.14.2 to 0.15.0, and things went very smoothly, if a 
> little slowly.
> The main reason the upgrade took so long was the block upgrades on the 
> datanodes. Each of our datanodes has 3 drives listed for the dfs.data.dir 
> parameter. From looking at the logs, it is fairly clear that the upgrade 
> procedure does not attempt to upgrade all listed dfs.data.dir's in parallel.
> I think even if all of your dfs.data.dir's are on the same physical device, 
> there would still be an advantage to performing the upgrade process in 
> parallel. The less downtime, the better: especially if it is potentially 20 
> minutes versus 60 minutes.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1418) DFSClient Uses Deprecated "mapred.task.id" Configuration Key Causing Unecessary Warning Messages

2011-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992636#comment-12992636
 ] 

Hadoop QA commented on HDFS-1418:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12455481/HDFS-1418.patch
  against trunk revision 1068968.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/156//console

This message is automatically generated.

> DFSClient Uses Deprecated "mapred.task.id" Configuration Key Causing 
> Unecessary Warning Messages
> 
>
> Key: HDFS-1418
> URL: https://issues.apache.org/jira/browse/HDFS-1418
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Ranjit Mathew
>Priority: Minor
> Attachments: HDFS-1418.patch
>
>
> Every invocation of the "hadoop fs" command leads to an unnecessary warning 
> like the following:
> {noformat}
> $ $HADOOP_HOME/bin/hadoop fs -ls /
> 10/09/24 15:10:23 WARN conf.Configuration: mapred.task.id is deprecated. 
> Instead, use mapreduce.task.attempt.id
> {noformat}
> This is easily fixed by updating 
> "src/java/org/apache/hadoop/hdfs/DFSClient.java".

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

2011-02-09 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1445:
-

Attachment: HDFS-1445-trunk.v22_hdfs_2-of-2.patch
HDFS-1445-trunk.v22_common_1-of-2.patch

This patch requires coordinated change in both COMMON and HDFS.

> Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it 
> once per directory instead of once per file
> --
>
> Key: HDFS-1445
> URL: https://issues.apache.org/jira/browse/HDFS-1445
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20.2
>Reporter: Matt Foley
>Assignee: Matt Foley
> Fix For: 0.22.0
>
> Attachments: HDFS-1445-trunk.v22_common_1-of-2.patch, 
> HDFS-1445-trunk.v22_hdfs_2-of-2.patch
>
>
> It was a bit of a puzzle why we can do a full scan of a disk in about 30 
> seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes 
> to do Upgrade replication via hardlinks.  It turns out that the 
> org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to 
> Runtime.getRuntime().exec(), to utilize native filesystem hardlink 
> capability.  So it is forking a full-weight external process, and we call it 
> on each individual file to be replicated.
> As a simple check on the possible cost of this approach, I built a Perl test 
> script (under Linux on a production-class datanode).  Perl also uses a 
> compiled and optimized p-code engine, and it has both native support for 
> hardlinks and the ability to do "exec".  
> -  A simple script to create 256,000 files in a directory tree organized like 
> the Datanode, took 10 seconds to run.
> -  Replicating that directory tree using hardlinks, the same way as the 
> Datanode, took 12 seconds using native hardlink support.
> -  The same replication using outcalls to exec, one per file, took 256 
> seconds!
> -  Batching the calls, and doing 'exec' once per directory instead of once 
> per file, took 16 seconds.
> Obviously, your mileage will vary based on the number of blocks per volume.  
> A volume with less than about 4000 blocks will have only 65 directories.  A 
> volume with more than 4K and less than about 250K blocks will have 4200 
> directories (more or less).  And there are two files per block (the data file 
> and the .meta file).  So the average number of files per directory may vary 
> from 2:1 to 500:1.  A node with 50K blocks and four volumes will have 25K 
> files per volume, or an average of about 6:1.  So this change may be expected 
> to take it down from, say, 12 minutes per volume to 2.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

2011-02-09 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1445:
-

Fix Version/s: 0.22.0
 Release Note: Batch hardlinking during "upgrade" snapshots, cutting time 
from aprx 8 minutes per volume to aprx 8 seconds.  Validated in both Linux and 
Windows.  Requires coordinated change in both COMMON and HDFS.
   Status: Patch Available  (was: Open)

Requires coordinated change in both COMMON and HDFS.

> Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it 
> once per directory instead of once per file
> --
>
> Key: HDFS-1445
> URL: https://issues.apache.org/jira/browse/HDFS-1445
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Affects Versions: 0.20.2
>Reporter: Matt Foley
>Assignee: Matt Foley
> Fix For: 0.22.0
>
>
> It was a bit of a puzzle why we can do a full scan of a disk in about 30 
> seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes 
> to do Upgrade replication via hardlinks.  It turns out that the 
> org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to 
> Runtime.getRuntime().exec(), to utilize native filesystem hardlink 
> capability.  So it is forking a full-weight external process, and we call it 
> on each individual file to be replicated.
> As a simple check on the possible cost of this approach, I built a Perl test 
> script (under Linux on a production-class datanode).  Perl also uses a 
> compiled and optimized p-code engine, and it has both native support for 
> hardlinks and the ability to do "exec".  
> -  A simple script to create 256,000 files in a directory tree organized like 
> the Datanode, took 10 seconds to run.
> -  Replicating that directory tree using hardlinks, the same way as the 
> Datanode, took 12 seconds using native hardlink support.
> -  The same replication using outcalls to exec, one per file, took 256 
> seconds!
> -  Batching the calls, and doing 'exec' once per directory instead of once 
> per file, took 16 seconds.
> Obviously, your mileage will vary based on the number of blocks per volume.  
> A volume with less than about 4000 blocks will have only 65 directories.  A 
> volume with more than 4K and less than about 250K blocks will have 4200 
> directories (more or less).  And there are two files per block (the data file 
> and the .meta file).  So the average number of files per directory may vary 
> from 2:1 to 500:1.  A node with 50K blocks and four volumes will have 25K 
> files per volume, or an average of about 6:1.  So this change may be expected 
> to take it down from, say, 12 minutes per volume to 2.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1335) HDFS side of HADOOP-6904: first step towards inter-version communications between dfs client and NameNode

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992582#comment-12992582
 ] 

Hudson commented on HDFS-1335:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> HDFS side of HADOOP-6904: first step towards inter-version communications 
> between dfs client and NameNode
> -
>
> Key: HDFS-1335
> URL: https://issues.apache.org/jira/browse/HDFS-1335
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client, name-node
>Affects Versions: 0.22.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.23.0
>
> Attachments: hdfsRPC.patch, hdfsRpcVersion.patch
>
>
> The idea is that for getProtocolVersion, NameNode checks if the client and 
> server versions are compatible if the server version is greater than the 
> client version. If no, throws a VersionIncompatible exception; otherwise, 
> returns the server version.
> On the dfs client side, when creating a NameNode proxy, catches the 
> VersionMismatch exception and then checks if the client version and the 
> server version are compatible if the client version is greater than the 
> server version. If not compatible, throws exception VersionIncomptible; 
> otherwise, records the server version and continues.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992583#comment-12992583
 ] 

Hudson commented on HDFS-900:
-

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Corrupt replicas are not tracked correctly through block report from DN
> ---
>
> Key: HDFS-900
> URL: https://issues.apache.org/jira/browse/HDFS-900
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: log-commented, reportCorruptBlock.patch, 
> to-reproduce.patch
>
>
> This one is tough to describe, but essentially the following order of events 
> is seen to occur:
> # A client marks one replica of a block to be corrupt by telling the NN about 
> it
> # Replication is then scheduled to make a new replica of this node
> # The replication completes, such that there are now 3 good replicas and 1 
> corrupt replica
> # The DN holding the corrupt replica sends a block report. Rather than 
> telling this DN to delete the node, the NN instead marks this as a new *good* 
> replica of the block, and schedules deletion on one of the good replicas.
> I don't know if this is a dataloss bug in the case of 1 corrupt replica with 
> dfs.replication=2, but it seems feasible. I will attach a debug log with some 
> commentary marked by '>', plus a unit test patch which I can get 
> to reproduce this behavior reliably. (it's not a proper unit test, just some 
> edits to an existing one to show it)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-560) Proposed enhancements/tuning to hadoop-hdfs/build.xml

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992580#comment-12992580
 ] 

Hudson commented on HDFS-560:
-

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


>  Proposed enhancements/tuning to hadoop-hdfs/build.xml
> --
>
> Key: HDFS-560
> URL: https://issues.apache.org/jira/browse/HDFS-560
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-560.patch
>
>
> sibling list of HADOOP-6206, enhancements to the hdfs build for easier 
> single-system build/test

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992581#comment-12992581
 ] 

Hudson commented on HDFS-1448:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Create multi-format parser for edits logs file, support binary and XML 
> formats initially
> 
>
> Key: HDFS-1448
> URL: https://issues.apache.org/jira/browse/HDFS-1448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.22.0
>Reporter: Erik Steffl
>Assignee: Erik Steffl
> Fix For: 0.23.0
>
> Attachments: HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, 
> HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22-5.patch, 
> HDFS-1448-0.22.patch, Viewer hierarchy.pdf, editsStored
>
>
> Create multi-format parser for edits logs file, support binary and XML 
> formats initially.
> Parsing should work from any supported format to any other supported format 
> (e.g. from binary to XML and from XML to binary).
> The binary format is the format used by FSEditLog class to read/write edits 
> file.
> Primary reason to develop this tool is to help with troubleshooting, the 
> binary format is hard to read and edit (for human troubleshooters).
> Longer term it could be used to clean up and minimize parsers for fsimage and 
> edits files. Edits parser OfflineEditsViewer is written in a very similar 
> fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer 
> and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This 
> is subject to change, specifically depending on adoption of avro (which would 
> completely change how objects are serialized as well as provide ways to 
> convert files to different formats).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1610) TestClientProtocolWithDelegationToken failing

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992579#comment-12992579
 ] 

Hudson commented on HDFS-1610:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> TestClientProtocolWithDelegationToken failing
> -
>
> Key: HDFS-1610
> URL: https://issues.apache.org/jira/browse/HDFS-1610
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-1610.txt
>
>
> Another instance of the same type of failure as MAPREDUCE-2300 (a mock 
> protocol implementation isn't returning a protocol signature)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992584#comment-12992584
 ] 

Hudson commented on HDFS-1602:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.23.0
>Reporter: Konstantin Boudnik
>Assignee: Boris Shkolnik
> Attachments: HDFS-1602-1.patch, HDFS-1602.patch
>
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1529) Incorrect handling of interrupts in waitForAckedSeqno can cause deadlock

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992591#comment-12992591
 ] 

Hudson commented on HDFS-1529:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Incorrect handling of interrupts in waitForAckedSeqno can cause deadlock
> 
>
> Key: HDFS-1529
> URL: https://issues.apache.org/jira/browse/HDFS-1529
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: Test.java, hdfs-1529.txt, hdfs-1529.txt, hdfs-1529.txt
>
>
> In HDFS-895 the handling of interrupts during hflush/close was changed to 
> preserve interrupt status. This ends up creating an infinite loop in 
> waitForAckedSeqno if the waiting thread gets interrupted, since Object.wait() 
> has a strange semantic that it doesn't give up the lock even momentarily if 
> the thread is already in interrupted state at the beginning of the call.
> We should decide what the correct behavior is here - if a thread is 
> interrupted while it's calling hflush() or close() should we (a) throw an 
> exception, perhaps InterruptedIOException (b) ignore, or (c) wait for the 
> flush to finish but preserve interrupt status on exit?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1601) Pipeline ACKs are sent as lots of tiny TCP packets

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992585#comment-12992585
 ] 

Hudson commented on HDFS-1601:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Pipeline ACKs are sent as lots of tiny TCP packets
> --
>
> Key: HDFS-1601
> URL: https://issues.apache.org/jira/browse/HDFS-1601
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-1601.txt, hdfs-1601.txt
>
>
> I noticed in an hbase benchmark that the packet counts in my network 
> monitoring seemed high, so took a short pcap trace and found that each 
> pipeline ACK was being sent as five packets, the first four of which only 
> contain one byte. We should buffer these bytes and send the PipelineAck as 
> one TCP packet.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1607) Fix references to misspelled method name getProtocolSigature

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992590#comment-12992590
 ] 

Hudson commented on HDFS-1607:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Fix references to misspelled method name getProtocolSigature
> 
>
> Key: HDFS-1607
> URL: https://issues.apache.org/jira/browse/HDFS-1607
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Trivial
> Attachments: hdfs-1607.txt
>
>


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1600) editsStored.xml cause release audit warning

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992587#comment-12992587
 ] 

Hudson commented on HDFS-1600:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])
HDFS-1600. Fix release audit warnings on trunk. Contributedy by Todd Lipcon


> editsStored.xml cause release audit warning
> ---
>
> Key: HDFS-1600
> URL: https://issues.apache.org/jira/browse/HDFS-1600
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: h1600_20110126.patch, hadoop-1600.txt
>
>
> The file 
> {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}}
>  for any new patch.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-863) Potential deadlock in TestOverReplicatedBlocks

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992588#comment-12992588
 ] 

Hudson commented on HDFS-863:
-

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Potential deadlock in TestOverReplicatedBlocks
> --
>
> Key: HDFS-863
> URL: https://issues.apache.org/jira/browse/HDFS-863
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Todd Lipcon
>Assignee: Ken Goodhope
> Fix For: 0.23.0
>
> Attachments: HDFS-863.patch, HDFS-863.patch, HDFS-863.patch, 
> HDFS-863.patch, TestNodeCount.png, cycle.png
>
>
> TestOverReplicatedBlocks.testProcesOverReplicateBlock synchronizes on 
> namesystem.heartbeats without synchronizing on namesystem first. Other places 
> in the code synchronize namesystem, then heartbeats. It's probably unlikely 
> to occur in this test case, but it's a simple fix.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1588) Add dfs.hosts.exclude to DFSConfigKeys and use constant in stead of hardcoded string

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992589#comment-12992589
 ] 

Hudson commented on HDFS-1588:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Add dfs.hosts.exclude to DFSConfigKeys and use constant in stead of hardcoded 
> string
> 
>
> Key: HDFS-1588
> URL: https://issues.apache.org/jira/browse/HDFS-1588
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Erik Steffl
>Assignee: Erik Steffl
> Fix For: 0.23.0
>
> Attachments: HDFS-1588-0.23.patch
>
>


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1557) Separate Storage from FSImage

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992592#comment-12992592
 ] 

Hudson commented on HDFS-1557:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Separate Storage from FSImage
> -
>
> Key: HDFS-1557
> URL: https://issues.apache.org/jira/browse/HDFS-1557
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: 0.23.0
>
> Attachments: 1557-suggestions.txt, HDFS-1557-branch-0.22.diff, 
> HDFS-1557-branch-0.22.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
> HDFS-1557-trunk.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, 
> HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, 
> HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff
>
>
> FSImage currently derives from Storage and FSEditLog has to call methods 
> directly on FSImage to access the filesystem. This JIRA is to separate the 
> Storage class out into NNStorage so that FSEditLog is less dependent on 
> FSImage. From this point, the other parts of the circular dependency should 
> be easy to fix.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1591) Fix javac, javadoc, findbugs warnings

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992594#comment-12992594
 ] 

Hudson commented on HDFS-1591:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Fix javac, javadoc, findbugs warnings
> -
>
> Key: HDFS-1591
> URL: https://issues.apache.org/jira/browse/HDFS-1591
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Po Cheung
>Assignee: Po Cheung
> Fix For: 0.22.0
>
> Attachments: hdfs-1591-trunk.patch
>
>
> Split from HADOOP-6642

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1598) ListPathsServlet excludes .*.crc files

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992598#comment-12992598
 ] 

Hudson commented on HDFS-1598:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> ListPathsServlet excludes .*.crc files
> --
>
> Key: HDFS-1598
> URL: https://issues.apache.org/jira/browse/HDFS-1598
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.2
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.21.1, 0.22.0, 0.23.0
>
> Attachments: h1598_20110126.patch, h1598_20110126_0.20.patch
>
>
> The {{.*.crc}} files are excluded by default.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1597) Batched edit log syncs can reset synctxid throw assertions

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992597#comment-12992597
 ] 

Hudson commented on HDFS-1597:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Batched edit log syncs can reset synctxid throw assertions
> --
>
> Key: HDFS-1597
> URL: https://issues.apache.org/jira/browse/HDFS-1597
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1597.txt, hdfs-1597.txt, hdfs-1597.txt, 
> illustrate-test-failure.txt
>
>
> The top of FSEditLog.logSync has the following assertion:
> {code}
> assert editStreams.size() > 0 : "no editlog streams";
> {code}
> which should actually come after checking to see if the sync was already 
> batched in by another thread.
> This is related to a second bug in which the same case causes synctxid to be 
> reset to 0

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1582) Remove auto-generated native build files

2011-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992599#comment-12992599
 ] 

Hudson commented on HDFS-1582:
--

Integrated in Hadoop-Hdfs-trunk-Commit #539 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/])


> Remove auto-generated native build files
> 
>
> Key: HDFS-1582
> URL: https://issues.apache.org/jira/browse/HDFS-1582
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/libhdfs
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HADOOP-6436.patch, HDFS-1582.diff
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The repo currently includes the automake and autoconf generated files for the 
> native build. Per discussion on HADOOP-6421 let's remove them and use the 
> host's automake and autoconf. We should also do this for libhdfs and 
> fuse-dfs. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1615) seek() on closed DFS input stream throws NPE

2011-02-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992568#comment-12992568
 ] 

Todd Lipcon commented on HDFS-1615:
---

It should throw IOE, not NPE :)

> seek() on closed DFS input stream throws NPE
> 
>
> Key: HDFS-1615
> URL: https://issues.apache.org/jira/browse/HDFS-1615
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> After closing an input stream on DFS, seeking slightly ahead of the last read 
> will throw an NPE:
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:749)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:42)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1600) editsStored.xml cause release audit warning

2011-02-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1600:
--

   Resolution: Fixed
Fix Version/s: 0.23.0
   Status: Resolved  (was: Patch Available)

> editsStored.xml cause release audit warning
> ---
>
> Key: HDFS-1600
> URL: https://issues.apache.org/jira/browse/HDFS-1600
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: h1600_20110126.patch, hadoop-1600.txt
>
>
> The file 
> {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}}
>  for any new patch.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-02-09 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992557#comment-12992557
 ] 

Konstantin Boudnik commented on HDFS-1602:
--

bq. FWIW, TestBlockRecovery.testErrorReplicas failed (timed out)
This JIRA is about TestStorageRestore.

Boris, would you like to backport it to 0.22 at least? The ticket needs to be 
closed.

> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.23.0
>Reporter: Konstantin Boudnik
>Assignee: Boris Shkolnik
> Attachments: HDFS-1602-1.patch, HDFS-1602.patch
>
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Assigned: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-02-09 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik reassigned HDFS-1602:


Assignee: Boris Shkolnik

> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.23.0
>Reporter: Konstantin Boudnik
>Assignee: Boris Shkolnik
> Attachments: HDFS-1602-1.patch, HDFS-1602.patch
>
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1616) hadoop fs -put and -copyFromLocal do not support globs in the source path

2011-02-09 Thread Harsh J Chouraria (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992549#comment-12992549
 ] 

Harsh J Chouraria commented on HDFS-1616:
-

This issue is a HADOOP (Common) one, not HDFS.

You can achieve a solution to globbing by Hadoop DFS by using {{FsShell.copy}} 
via {{hdfs -cp}} with a command like:
{{hdfs -cp 'file:/home/test/*.jar' /destination}}

This command does a globbing on its own (not shell-driven). Should also work 
inside Grunt/etc.{{.}}

To programmatically call {{fs}} sub-functions, it would be wise use the 
Hadoop's {{FileSystem}} API directly in your language instead. There are Python 
bindings for HDFS available, to start with, apart from the Java one provided.

Regarding glob differences in Shell/Hadoop, there isn't really a standard 
available to conform to (please correct me if I'm wrong). A good collection of 
cases is covered in {{test/o.a.h.fs.TestGlobPaths}}, which IMHO caters to most 
of globbing requirements (there's also a subset of regular-expression support 
available).

> hadoop fs -put and -copyFromLocal do not support globs in the source path
> -
>
> Key: HDFS-1616
> URL: https://issues.apache.org/jira/browse/HDFS-1616
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.2
> Environment: Cloudera CDH3b3
>Reporter: Jay Hacker
>Priority: Minor
>
> I'd like to be able to use Hadoop globbing with the FsShell -put command, but 
> it doesn't work:
> {noformat}
> $ ls
> file1 file2
> $ hadoop fs -put '*' .
> put: File * does not exist.
> {noformat}
> This has probably gone unnoticed because your shell usually handles it, but 
> a) I'd like to be able to call 'hadoop fs' programatically without a shell, 
> b) it doesn't work in Pig or Grunt, where there is no shell helping you, and 
> c) Hadoop globbing differs from shell globbing and it would be nice to be 
> able to use it consistently.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1600) editsStored.xml cause release audit warning

2011-02-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1600:
-

Hadoop Flags: [Reviewed]

+1 patch looks good.

> editsStored.xml cause release audit warning
> ---
>
> Key: HDFS-1600
> URL: https://issues.apache.org/jira/browse/HDFS-1600
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Todd Lipcon
> Attachments: h1600_20110126.patch, hadoop-1600.txt
>
>
> The file 
> {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}}
>  for any new patch.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1616) hadoop fs -put and -copyFromLocal do not support globs in the source path

2011-02-09 Thread Jay Hacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Hacker updated HDFS-1616:
-

Description: 
I'd like to be able to use Hadoop globbing with the FsShell -put command, but 
it doesn't work:

{noformat}
$ ls
file1 file2
$ hadoop fs -put '*' .
put: File * does not exist.
{noformat}

This has probably gone unnoticed because your shell usually handles it, but a) 
I'd like to be able to call 'hadoop fs' programatically without a shell, b) it 
doesn't work in Pig or Grunt, where there is no shell helping you, and c) 
Hadoop globbing differs from shell globbing and it would be nice to be able to 
use it consistently.

  was:
I'd like to be able to use Hadoop globbing with the FsShell -put command, but 
it doesn't work:

{noformat}
$ ls
file1 file2
$ hadoop fs -put '*' .
put: File * does not exist.
{noformat}

This has probably gone unnoticed because your shell usually handles it, but a) 
I'd like to be able to call 'hadoop fs' programatically without a shell, and b) 
Hadoop globbing differs from shell globbing and it would be nice to be able to 
use it consistently.


> hadoop fs -put and -copyFromLocal do not support globs in the source path
> -
>
> Key: HDFS-1616
> URL: https://issues.apache.org/jira/browse/HDFS-1616
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.2
> Environment: Cloudera CDH3b3
>Reporter: Jay Hacker
>Priority: Minor
>
> I'd like to be able to use Hadoop globbing with the FsShell -put command, but 
> it doesn't work:
> {noformat}
> $ ls
> file1 file2
> $ hadoop fs -put '*' .
> put: File * does not exist.
> {noformat}
> This has probably gone unnoticed because your shell usually handles it, but 
> a) I'd like to be able to call 'hadoop fs' programatically without a shell, 
> b) it doesn't work in Pig or Grunt, where there is no shell helping you, and 
> c) Hadoop globbing differs from shell globbing and it would be nice to be 
> able to use it consistently.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HDFS-1616) hadoop fs -put and -copyFromLocal do not support globs in the source path

2011-02-09 Thread Jay Hacker (JIRA)
hadoop fs -put and -copyFromLocal do not support globs in the source path
-

 Key: HDFS-1616
 URL: https://issues.apache.org/jira/browse/HDFS-1616
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.2
 Environment: Cloudera CDH3b3
Reporter: Jay Hacker
Priority: Minor


I'd like to be able to use Hadoop globbing with the FsShell -put command, but 
it doesn't work:

{noformat}
$ ls
file1 file2
$ hadoop fs -put '*' .
put: File * does not exist.
{noformat}

This has probably gone unnoticed because your shell usually handles it, but a) 
I'd like to be able to call 'hadoop fs' programatically without a shell, and b) 
Hadoop globbing differs from shell globbing and it would be nice to be able to 
use it consistently.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1615) seek() on closed DFS input stream throws NPE

2011-02-09 Thread M. C. Srivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992478#comment-12992478
 ] 

M. C. Srivas commented on HDFS-1615:


>After closing an input stream on DFS, seeking slightly ahead of the last read 
>will throw an NPE:

Isn't this a good thing? It exposes bugs at the layer above DFS. I'd prefer to 
keep this behaviour rather than fix it.

> seek() on closed DFS input stream throws NPE
> 
>
> Key: HDFS-1615
> URL: https://issues.apache.org/jira/browse/HDFS-1615
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> After closing an input stream on DFS, seeking slightly ahead of the last read 
> will throw an NPE:
> java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:749)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:42)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1600) editsStored.xml cause release audit warning

2011-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992465#comment-12992465
 ] 

Hadoop QA commented on HDFS-1600:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12470668/hadoop-1600.txt
  against trunk revision 1068725.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.TestFileConcurrentReader

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/155//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/155//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/155//console

This message is automatically generated.

> editsStored.xml cause release audit warning
> ---
>
> Key: HDFS-1600
> URL: https://issues.apache.org/jira/browse/HDFS-1600
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Todd Lipcon
> Attachments: h1600_20110126.patch, hadoop-1600.txt
>
>
> The file 
> {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}}
>  for any new patch.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1600) editsStored.xml cause release audit warning

2011-02-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1600:
--

Status: Patch Available  (was: Open)

> editsStored.xml cause release audit warning
> ---
>
> Key: HDFS-1600
> URL: https://issues.apache.org/jira/browse/HDFS-1600
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Todd Lipcon
> Attachments: h1600_20110126.patch, hadoop-1600.txt
>
>
> The file 
> {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}}
>  for any new patch.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1600) editsStored.xml cause release audit warning

2011-02-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1600:
--

Attachment: hadoop-1600.txt

here's a patch which just updates the excludes for rat

> editsStored.xml cause release audit warning
> ---
>
> Key: HDFS-1600
> URL: https://issues.apache.org/jira/browse/HDFS-1600
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Erik Steffl
> Attachments: h1600_20110126.patch, hadoop-1600.txt
>
>
> The file 
> {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}}
>  for any new patch.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Assigned: (HDFS-1600) editsStored.xml cause release audit warning

2011-02-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HDFS-1600:
-

Assignee: Todd Lipcon  (was: Erik Steffl)

> editsStored.xml cause release audit warning
> ---
>
> Key: HDFS-1600
> URL: https://issues.apache.org/jira/browse/HDFS-1600
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Todd Lipcon
> Attachments: h1600_20110126.patch, hadoop-1600.txt
>
>
> The file 
> {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}}
>  for any new patch.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-560) Proposed enhancements/tuning to hadoop-hdfs/build.xml

2011-02-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-560:


   Resolution: Fixed
Fix Version/s: 0.23.0
   Status: Resolved  (was: Patch Available)

>  Proposed enhancements/tuning to hadoop-hdfs/build.xml
> --
>
> Key: HDFS-560
> URL: https://issues.apache.org/jira/browse/HDFS-560
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-560.patch
>
>
> sibling list of HADOOP-6206, enhancements to the hdfs build for easier 
> single-system build/test

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1603) Namenode gets sticky if one of namenode storage volumes disappears (removed, unmounted, etc.)

2011-02-09 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992397#comment-12992397
 ] 

dhruba borthakur commented on HDFS-1603:


"During this time none of normal NN operations are possible"

how long was this period?

> Namenode gets sticky if one of namenode storage volumes disappears (removed, 
> unmounted, etc.)
> -
>
> Key: HDFS-1603
> URL: https://issues.apache.org/jira/browse/HDFS-1603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Konstantin Boudnik
>
> While investigating failures on HDFS-1602 it became apparent that once a 
> namenode storage volume is pulled out NN becomes completely "sticky" until 
> {{FSImage:processIOError: removing storage}} move the storage from the active 
> set. During this time none of normal NN operations are possible (e.g. 
> creating a directory on HDFS timeouts eventually).
> In case of NFS this can be workaround'd with soft,intr,timeo,retrans 
> settings. However, a better handling of the situation is apparently possible 
> and needs to be implemented.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1595) DFSClient may incorrectly detect datanode failure

2011-02-09 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992396#comment-12992396
 ] 

dhruba borthakur commented on HDFS-1595:


Error recovery is a pain when a datanode in a write pipeline fails. Sometimes 
it is truly difficult for the client to accurately determine which datanode 
failed. Does it make sense to change the algorithm itself: what are the 
tradeoff's if we say that when the number of datanode in the write-pipeline 
decreases to min.replication, the client streams data directly to all remaining 
(or new) datanodes, instead of pipelining? If new datanodes fail, the client 
will find it easy to determine accurately which datanodes are dead.


> DFSClient may incorrectly detect datanode failure
> -
>
> Key: HDFS-1595
> URL: https://issues.apache.org/jira/browse/HDFS-1595
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.20.4
>Reporter: Tsz Wo (Nicholas), SZE
>Priority: Critical
> Attachments: hdfs-1595-idea.txt
>
>
> Suppose a source datanode S is writing to a destination datanode D in a write 
> pipeline.  We have an implicit assumption that _if S catches an exception 
> when it is writing to D, then D is faulty and S is fine._  As a result, 
> DFSClient will take out D from the pipeline, reconstruct the write pipeline 
> with the remaining datanodes and then continue writing .
> However, we find a case that the faulty machine F is indeed S but not D.  In 
> the case we found, F has a faulty network interface (or a faulty switch port) 
> in such a way that the faulty network interface works fine when transferring 
> a small amount of data, say 1MB, but it often fails when transferring a large 
> amount of data, say 100MB.
> It is even worst if F is the first datanode in the pipeline.  Consider the 
> following:
> # DFSClient creates a pipeline with three datanodes.  The first datanode is F.
> # F catches an IOException when writing to the second datanode. Then, F 
> reports the second datanode has error.
> # DFSClient removes the second datanode from the pipeline and continue 
> writing with the remaining datanode(s).
> # The pipeline now has two datanodes but (2) and (3) repeat.
> # Now, only F remains in the pipeline.  DFSClient continues writing with one 
> replica in F.
> # The write succeeds and DFSClient is able to *close the file successfully*.
> # The block is under replicated.  The NameNode schedules replication from F 
> to some other datanode D.
> # The replication fails for the same reason.  D reports to the NameNode that 
> the replica in F is corrupted.
> # The NameNode marks the replica in F is corrupted.
> # The block is corrupted since no replica is available.
> We were able to manually divide the replicas into small files and copy them 
> out from F without fixing the hardware.  The replicas seems uncorrupted.  
> This is a *data availability problem*.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1606) Provide a stronger data guarantee in the write pipeline

2011-02-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992392#comment-12992392
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1606:
--

> In fact, if we can have a system-wide config ...
Will do.

> Provide a stronger data guarantee in the write pipeline
> ---
>
> Key: HDFS-1606
> URL: https://issues.apache.org/jira/browse/HDFS-1606
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> In the current design, if there is a datanode/network failure in the write 
> pipeline, DFSClient will try to remove the failed datanode from the pipeline 
> and then continue writing with the remaining datanodes.  As a result, the 
> number of datanodes in the pipeline is decreased.  Unfortunately, it is 
> possible that DFSClient may incorrectly remove a healthy datanode but leave 
> the failed datanode in the pipeline because failure detection may be 
> inaccurate under erroneous conditions.
> We propose to have a new mechanism for adding new datanodes to the pipeline 
> in order to provide a stronger data guarantee.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1606) Provide a stronger data guarantee in the write pipeline

2011-02-09 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992390#comment-12992390
 ] 

dhruba borthakur commented on HDFS-1606:


In fact, if we can have  a system-wide config on whether to trigger this 
behaviour or not, that will be great.

> Provide a stronger data guarantee in the write pipeline
> ---
>
> Key: HDFS-1606
> URL: https://issues.apache.org/jira/browse/HDFS-1606
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> In the current design, if there is a datanode/network failure in the write 
> pipeline, DFSClient will try to remove the failed datanode from the pipeline 
> and then continue writing with the remaining datanodes.  As a result, the 
> number of datanodes in the pipeline is decreased.  Unfortunately, it is 
> possible that DFSClient may incorrectly remove a healthy datanode but leave 
> the failed datanode in the pipeline because failure detection may be 
> inaccurate under erroneous conditions.
> We propose to have a new mechanism for adding new datanodes to the pipeline 
> in order to provide a stronger data guarantee.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1612) HDFS Design Documentation is outdated

2011-02-09 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992386#comment-12992386
 ] 

dhruba borthakur commented on HDFS-1612:


This portion of the HDFS document is outdated. would you like to submit a patch 
that brings it upto date?

> HDFS Design Documentation is outdated
> -
>
> Key: HDFS-1612
> URL: https://issues.apache.org/jira/browse/HDFS-1612
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.20.2, 0.21.0
> Environment: 
> http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#The+Persistence+of+File+System+Metadata
> http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#The+Persistence+of+File+System+Metadata
>Reporter: Joe Crobak
>Priority: Minor
>
> I was trying to discover details about the Secondary NameNode, and came 
> across the description below in the HDFS design doc.
> {quote}
> The NameNode keeps an image of the entire file system namespace and file 
> Blockmap in memory. This key metadata item is designed to be compact, such 
> that a NameNode with 4 GB of RAM is plenty to support a huge number of files 
> and directories. When the NameNode starts up, it reads the FsImage and 
> EditLog from disk, applies all the transactions from the EditLog to the 
> in-memory representation of the FsImage, and flushes out this new version 
> into a new FsImage on disk. It can then truncate the old EditLog because its 
> transactions have been applied to the persistent FsImage. This process is 
> called a checkpoint. *In the current implementation, a checkpoint only occurs 
> when the NameNode starts up. Work is in progress to support periodic 
> checkpointing in the near future.*
> {quote}
> (emphasis mine).
> Note that this directly conflicts with information in the hdfs user guide, 
> http://hadoop.apache.org/common/docs/r0.20.2/hdfs_user_guide.html#Secondary+NameNode
> and 
> http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node
> I haven't done a thorough audit of that doc-- I only noticed the above 
> inaccuracy.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-02-09 Thread Nigel Daley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992376#comment-12992376
 ] 

Nigel Daley commented on HDFS-1602:
---

FWIW, TestBlockRecovery.testErrorReplicas failed (timed out). This is in the 
same class as the fixed test I think. Search console for failure: 
https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/537/console

Re-running build again.


> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.23.0
>Reporter: Konstantin Boudnik
> Attachments: HDFS-1602-1.patch, HDFS-1602.patch
>
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira