[jira] Commented: (HDFS-1606) Provide a stronger data guarantee in the write pipeline
[ https://issues.apache.org/jira/browse/HDFS-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992909#comment-12992909 ] Tsz Wo (Nicholas), SZE commented on HDFS-1606: -- > 1. Find a datanode D by some means. I have checked the code. This is easier than I expect since {{BlockPlacementPolicy}} is able to find an additional datanode, given a list of chosen datanodes. The remaining work of this part is to add a new method to {{ClientProtocol}} so that {{DFSClient}} could use it. > Provide a stronger data guarantee in the write pipeline > --- > > Key: HDFS-1606 > URL: https://issues.apache.org/jira/browse/HDFS-1606 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > > In the current design, if there is a datanode/network failure in the write > pipeline, DFSClient will try to remove the failed datanode from the pipeline > and then continue writing with the remaining datanodes. As a result, the > number of datanodes in the pipeline is decreased. Unfortunately, it is > possible that DFSClient may incorrectly remove a healthy datanode but leave > the failed datanode in the pipeline because failure detection may be > inaccurate under erroneous conditions. > We propose to have a new mechanism for adding new datanodes to the pipeline > in order to provide a stronger data guarantee. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992907#comment-12992907 ] dhruba borthakur commented on HDFS-347: --- Thanks Ryan for merging that patch to the head of 0.20-append branch. Please do let me know if you see any problems with it. I agree with Allen/Todd that since Todd's patch is an optimization, we can get it committed even if this optimization does not work on non-linux platforms. Can some security guru review the security aspects of it? > DFS read performance suboptimal when client co-located on nodes with data > - > > Key: HDFS-347 > URL: https://issues.apache.org/jira/browse/HDFS-347 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: George Porter >Assignee: Todd Lipcon > Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, > HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-branch-20-append.txt, > all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc > > > One of the major strategies Hadoop uses to get scalable data processing is to > move the code to the data. However, putting the DFS client on the same > physical node as the data blocks it acts on doesn't improve read performance > as much as expected. > After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem > is due to the HDFS streaming protocol causing many more read I/O operations > (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB > disk block from the DataNode process (running in a separate JVM) running on > the same machine. The DataNode will satisfy the single disk block request by > sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, > this is done in the sendChunk() method, relying on Java's transferTo() > method. Depending on the host O/S and JVM implementation, transferTo() is > implemented as either a sendfilev() syscall or a pair of mmap() and write(). > In either case, each chunk is read from the disk by issuing a separate I/O > operation for each chunk. The result is that the single request for a 64-MB > block ends up hitting the disk as over a thousand smaller requests for 64-KB > each. > Since the DFSClient runs in a different JVM and process than the DataNode, > shuttling data from the disk to the DFSClient also results in context > switches each time network packets get sent (in this case, the 64-kb chunk > turns into a large number of 1500 byte packet send operations). Thus we see > a large number of context switches for each block send operation. > I'd like to get some feedback on the best way to address this, but I think > providing a mechanism for a DFSClient to directly open data blocks that > happen to be on the same machine. It could do this by examining the set of > LocatedBlocks returned by the NameNode, marking those that should be resident > on the local host. Since the DataNode and DFSClient (probably) share the > same hadoop configuration, the DFSClient should be able to find the files > holding the block data, and it could directly open them and send data back to > the client. This would avoid the context switches imposed by the network > layer, and would allow for much larger read buffers than 64KB, which should > reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1618) configure files that are generated as part of the released tarball need to have executable bit set
[ https://issues.apache.org/jira/browse/HDFS-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992877#comment-12992877 ] Konstantin Boudnik commented on HDFS-1618: -- +1 patch looks good. Let's run it through usual validation cycle. > configure files that are generated as part of the released tarball need to > have executable bit set > --- > > Key: HDFS-1618 > URL: https://issues.apache.org/jira/browse/HDFS-1618 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik > Attachments: HDFS-1618.patch > > > Currently the configure files that are packaged in a tarball are -rw-rw-r-- -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1619) Does libhdfs really need to depend on AC_TYPE_INT16_T, AC_TYPE_INT32_T, AC_TYPE_INT64_T and AC_TYPE_UINT16_T ?
[ https://issues.apache.org/jira/browse/HDFS-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992872#comment-12992872 ] Allen Wittenauer commented on HDFS-1619: Given that CentOS/RHEL 5.5 doesn't ship with a working Java, I don't see the issue with requiring a newer autoconf toolset. > Does libhdfs really need to depend on AC_TYPE_INT16_T, AC_TYPE_INT32_T, > AC_TYPE_INT64_T and AC_TYPE_UINT16_T ? > -- > > Key: HDFS-1619 > URL: https://issues.apache.org/jira/browse/HDFS-1619 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Roman Shaposhnik >Assignee: Konstantin Shvachko > > Currently configure.ac uses AC_TYPE_INT16_T, AC_TYPE_INT32_T, AC_TYPE_INT64_T > and AC_TYPE_UINT16_T and thus requires autoconf 2.61 or higher. > This prevents using it on such platforms as CentOS/RHEL 5.4 and 5.5. Given > that those are pretty popular and also given that it is really difficult to > find a platform > these days that doesn't natively define intXX_t types I'm curious as to > whether we can simply remove those macros or perhaps fail ONLY if we happen > to be on such > a platform. > Here's a link to GNU autoconf docs for your reference: > http://www.gnu.org/software/hello/manual/autoconf/Particular-Types.html -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HDFS-1619) Does libhdfs really need to depend on AC_TYPE_INT16_T, AC_TYPE_INT32_T, AC_TYPE_INT64_T and AC_TYPE_UINT16_T ?
Does libhdfs really need to depend on AC_TYPE_INT16_T, AC_TYPE_INT32_T, AC_TYPE_INT64_T and AC_TYPE_UINT16_T ? -- Key: HDFS-1619 URL: https://issues.apache.org/jira/browse/HDFS-1619 Project: Hadoop HDFS Issue Type: Improvement Reporter: Roman Shaposhnik Assignee: Konstantin Shvachko Currently configure.ac uses AC_TYPE_INT16_T, AC_TYPE_INT32_T, AC_TYPE_INT64_T and AC_TYPE_UINT16_T and thus requires autoconf 2.61 or higher. This prevents using it on such platforms as CentOS/RHEL 5.4 and 5.5. Given that those are pretty popular and also given that it is really difficult to find a platform these days that doesn't natively define intXX_t types I'm curious as to whether we can simply remove those macros or perhaps fail ONLY if we happen to be on such a platform. Here's a link to GNU autoconf docs for your reference: http://www.gnu.org/software/hello/manual/autoconf/Particular-Types.html -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1618) configure files that are generated as part of the released tarball need to have executable bit set
[ https://issues.apache.org/jira/browse/HDFS-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Shaposhnik updated HDFS-1618: --- Attachment: HDFS-1618.patch Patch attached > configure files that are generated as part of the released tarball need to > have executable bit set > --- > > Key: HDFS-1618 > URL: https://issues.apache.org/jira/browse/HDFS-1618 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik > Attachments: HDFS-1618.patch > > > Currently the configure files that are packaged in a tarball are -rw-rw-r-- -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HDFS-347: - Attachment: HDFS-347-branch-20-append.txt applies to head of branch-20-append > DFS read performance suboptimal when client co-located on nodes with data > - > > Key: HDFS-347 > URL: https://issues.apache.org/jira/browse/HDFS-347 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: George Porter >Assignee: Todd Lipcon > Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, > HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-branch-20-append.txt, > all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc > > > One of the major strategies Hadoop uses to get scalable data processing is to > move the code to the data. However, putting the DFS client on the same > physical node as the data blocks it acts on doesn't improve read performance > as much as expected. > After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem > is due to the HDFS streaming protocol causing many more read I/O operations > (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB > disk block from the DataNode process (running in a separate JVM) running on > the same machine. The DataNode will satisfy the single disk block request by > sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, > this is done in the sendChunk() method, relying on Java's transferTo() > method. Depending on the host O/S and JVM implementation, transferTo() is > implemented as either a sendfilev() syscall or a pair of mmap() and write(). > In either case, each chunk is read from the disk by issuing a separate I/O > operation for each chunk. The result is that the single request for a 64-MB > block ends up hitting the disk as over a thousand smaller requests for 64-KB > each. > Since the DFSClient runs in a different JVM and process than the DataNode, > shuttling data from the disk to the DFSClient also results in context > switches each time network packets get sent (in this case, the 64-kb chunk > turns into a large number of 1500 byte packet send operations). Thus we see > a large number of context switches for each block send operation. > I'd like to get some feedback on the best way to address this, but I think > providing a mechanism for a DFSClient to directly open data blocks that > happen to be on the same machine. It could do this by examining the set of > LocatedBlocks returned by the NameNode, marking those that should be resident > on the local host. Since the DataNode and DFSClient (probably) share the > same hadoop configuration, the DFSClient should be able to find the files > holding the block data, and it could directly open them and send data back to > the client. This would avoid the context switches imposed by the network > layer, and would allow for much larger read buffers than 64KB, which should > reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992801#comment-12992801 ] ryan rawson commented on HDFS-347: -- ok this was my bad, i applied the patch wrong. unit test passes. I'll attach a patch for others > DFS read performance suboptimal when client co-located on nodes with data > - > > Key: HDFS-347 > URL: https://issues.apache.org/jira/browse/HDFS-347 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: George Porter >Assignee: Todd Lipcon > Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, > HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png, > hdfs-347.txt, local-reads-doc > > > One of the major strategies Hadoop uses to get scalable data processing is to > move the code to the data. However, putting the DFS client on the same > physical node as the data blocks it acts on doesn't improve read performance > as much as expected. > After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem > is due to the HDFS streaming protocol causing many more read I/O operations > (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB > disk block from the DataNode process (running in a separate JVM) running on > the same machine. The DataNode will satisfy the single disk block request by > sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, > this is done in the sendChunk() method, relying on Java's transferTo() > method. Depending on the host O/S and JVM implementation, transferTo() is > implemented as either a sendfilev() syscall or a pair of mmap() and write(). > In either case, each chunk is read from the disk by issuing a separate I/O > operation for each chunk. The result is that the single request for a 64-MB > block ends up hitting the disk as over a thousand smaller requests for 64-KB > each. > Since the DFSClient runs in a different JVM and process than the DataNode, > shuttling data from the disk to the DFSClient also results in context > switches each time network packets get sent (in this case, the 64-kb chunk > turns into a large number of 1500 byte packet send operations). Thus we see > a large number of context switches for each block send operation. > I'd like to get some feedback on the best way to address this, but I think > providing a mechanism for a DFSClient to directly open data blocks that > happen to be on the same machine. It could do this by examining the set of > LocatedBlocks returned by the NameNode, marking those that should be resident > on the local host. Since the DataNode and DFSClient (probably) share the > same hadoop configuration, the DFSClient should be able to find the files > holding the block data, and it could directly open them and send data back to > the client. This would avoid the context switches imposed by the network > layer, and would allow for much larger read buffers than 64KB, which should > reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992786#comment-12992786 ] ryan rawson commented on HDFS-347: -- Applying this patch to branch-20-append and the unit test passes. Still trying to figure out why it works on one thing and not on the other. The patch is pretty dang simple too. > DFS read performance suboptimal when client co-located on nodes with data > - > > Key: HDFS-347 > URL: https://issues.apache.org/jira/browse/HDFS-347 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: George Porter >Assignee: Todd Lipcon > Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, > HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png, > hdfs-347.txt, local-reads-doc > > > One of the major strategies Hadoop uses to get scalable data processing is to > move the code to the data. However, putting the DFS client on the same > physical node as the data blocks it acts on doesn't improve read performance > as much as expected. > After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem > is due to the HDFS streaming protocol causing many more read I/O operations > (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB > disk block from the DataNode process (running in a separate JVM) running on > the same machine. The DataNode will satisfy the single disk block request by > sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, > this is done in the sendChunk() method, relying on Java's transferTo() > method. Depending on the host O/S and JVM implementation, transferTo() is > implemented as either a sendfilev() syscall or a pair of mmap() and write(). > In either case, each chunk is read from the disk by issuing a separate I/O > operation for each chunk. The result is that the single request for a 64-MB > block ends up hitting the disk as over a thousand smaller requests for 64-KB > each. > Since the DFSClient runs in a different JVM and process than the DataNode, > shuttling data from the disk to the DFSClient also results in context > switches each time network packets get sent (in this case, the 64-kb chunk > turns into a large number of 1500 byte packet send operations). Thus we see > a large number of context switches for each block send operation. > I'd like to get some feedback on the best way to address this, but I think > providing a mechanism for a DFSClient to directly open data blocks that > happen to be on the same machine. It could do this by examining the set of > LocatedBlocks returned by the NameNode, marking those that should be resident > on the local host. Since the DataNode and DFSClient (probably) share the > same hadoop configuration, the DFSClient should be able to find the files > holding the block data, and it could directly open them and send data back to > the client. This would avoid the context switches imposed by the network > layer, and would allow for much larger read buffers than 64KB, which should > reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HDFS-1618) configure files that are generated as part of the released tarball need to have executable bit set
configure files that are generated as part of the released tarball need to have executable bit set --- Key: HDFS-1618 URL: https://issues.apache.org/jira/browse/HDFS-1618 Project: Hadoop HDFS Issue Type: Improvement Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Currently the configure files that are packaged in a tarball are -rw-rw-r-- -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992766#comment-12992766 ] ryan rawson commented on HDFS-347: -- dhruba, I am not seeing the file src/hdfs/org/apache/hadoop/hdfs/metrics/DFSClientMetrics.java in branch-20-append (nor cdh3b2). I also got a number of rejects, here are some highlights: ClientDatanodeProtocol, your variant has copyBlock, ours does not (hence the rej). Misc field differences in DFSClient, including the metrics object After resolving them I was able to get it up and going. I'm not able to get the unit test to pass, I'm guessing it's this: 2011-02-09 14:35:49,926 DEBUG hdfs.DFSClient (DFSClient.java:fetchBlockByteRange(1927)) - fetchBlockByteRange shortCircuitLocalReads true localhst h132.sfo.stumble.net/10.10.1.132 targetAddr /127.0.0.1:62665 Since we don't recognize that we are 'local', we do the normal read path which is failing. Any tips? > DFS read performance suboptimal when client co-located on nodes with data > - > > Key: HDFS-347 > URL: https://issues.apache.org/jira/browse/HDFS-347 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: George Porter >Assignee: Todd Lipcon > Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch, > HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png, > hdfs-347.txt, local-reads-doc > > > One of the major strategies Hadoop uses to get scalable data processing is to > move the code to the data. However, putting the DFS client on the same > physical node as the data blocks it acts on doesn't improve read performance > as much as expected. > After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem > is due to the HDFS streaming protocol causing many more read I/O operations > (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB > disk block from the DataNode process (running in a separate JVM) running on > the same machine. The DataNode will satisfy the single disk block request by > sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, > this is done in the sendChunk() method, relying on Java's transferTo() > method. Depending on the host O/S and JVM implementation, transferTo() is > implemented as either a sendfilev() syscall or a pair of mmap() and write(). > In either case, each chunk is read from the disk by issuing a separate I/O > operation for each chunk. The result is that the single request for a 64-MB > block ends up hitting the disk as over a thousand smaller requests for 64-KB > each. > Since the DFSClient runs in a different JVM and process than the DataNode, > shuttling data from the disk to the DFSClient also results in context > switches each time network packets get sent (in this case, the 64-kb chunk > turns into a large number of 1500 byte packet send operations). Thus we see > a large number of context switches for each block send operation. > I'd like to get some feedback on the best way to address this, but I think > providing a mechanism for a DFSClient to directly open data blocks that > happen to be on the same machine. It could do this by examining the set of > LocatedBlocks returned by the NameNode, marking those that should be resident > on the local host. Since the DataNode and DFSClient (probably) share the > same hadoop configuration, the DFSClient should be able to find the files > holding the block data, and it could directly open them and send data back to > the client. This would avoid the context switches imposed by the network > layer, and would allow for much larger read buffers than 64KB, which should > reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992749#comment-12992749 ] Todd Lipcon commented on HDFS-1602: --- +1 on patch for 22 > Fix HADOOP-4885 for it is doesn't work as expected. > --- > > Key: HDFS-1602 > URL: https://issues.apache.org/jira/browse/HDFS-1602 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.23.0 >Reporter: Konstantin Boudnik >Assignee: Boris Shkolnik > Attachments: HDFS-1602-1.patch, HDFS-1602.patch, HDFS-1602v22.patch > > > NameNode storage restore functionality doesn't work (as HDFS-903 > demonstrated). This needs to be either disabled, or removed, or fixed. This > feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992740#comment-12992740 ] Hadoop QA commented on HDFS-1602: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12470731/HDFS-1602v22.patch against trunk revision 1068968. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/158//console This message is automatically generated. > Fix HADOOP-4885 for it is doesn't work as expected. > --- > > Key: HDFS-1602 > URL: https://issues.apache.org/jira/browse/HDFS-1602 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.23.0 >Reporter: Konstantin Boudnik >Assignee: Boris Shkolnik > Attachments: HDFS-1602-1.patch, HDFS-1602.patch, HDFS-1602v22.patch > > > NameNode storage restore functionality doesn't work (as HDFS-903 > demonstrated). This needs to be either disabled, or removed, or fixed. This > feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1602: - Attachment: HDFS-1602v22.patch Patch for 0.22 > Fix HADOOP-4885 for it is doesn't work as expected. > --- > > Key: HDFS-1602 > URL: https://issues.apache.org/jira/browse/HDFS-1602 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.23.0 >Reporter: Konstantin Boudnik >Assignee: Boris Shkolnik > Attachments: HDFS-1602-1.patch, HDFS-1602.patch, HDFS-1602v22.patch > > > NameNode storage restore functionality doesn't work (as HDFS-903 > demonstrated). This needs to be either disabled, or removed, or fixed. This > feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1603) Namenode gets sticky if one of namenode storage volumes disappears (removed, unmounted, etc.)
[ https://issues.apache.org/jira/browse/HDFS-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992729#comment-12992729 ] Konstantin Boudnik commented on HDFS-1603: -- It clearly depends on NFS mount's timeout. > Namenode gets sticky if one of namenode storage volumes disappears (removed, > unmounted, etc.) > - > > Key: HDFS-1603 > URL: https://issues.apache.org/jira/browse/HDFS-1603 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Konstantin Boudnik > > While investigating failures on HDFS-1602 it became apparent that once a > namenode storage volume is pulled out NN becomes completely "sticky" until > {{FSImage:processIOError: removing storage}} move the storage from the active > set. During this time none of normal NN operations are possible (e.g. > creating a directory on HDFS timeouts eventually). > In case of NFS this can be workaround'd with soft,intr,timeo,retrans > settings. However, a better handling of the situation is apparently possible > and needs to be implemented. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (HDFS-1617) CLONE to COMMON - Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file
[ https://issues.apache.org/jira/browse/HDFS-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley resolved HDFS-1617. -- Resolution: Fixed Release Note: (was: Batch hardlinking during "upgrade" snapshots, cutting time from aprx 8 minutes per volume to aprx 8 seconds. Validated in both Linux and Windows. Requires coordinated change in both COMMON and HDFS.) no change. need to open under COMMON. > CLONE to COMMON - Batch the calls in DataStorage to > FileUtil.createHardLink(), so we call it once per directory instead of once > per file > > > Key: HDFS-1617 > URL: https://issues.apache.org/jira/browse/HDFS-1617 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20.2 >Reporter: Matt Foley >Assignee: Matt Foley > Fix For: 0.22.0 > > > It was a bit of a puzzle why we can do a full scan of a disk in about 30 > seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes > to do Upgrade replication via hardlinks. It turns out that the > org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to > Runtime.getRuntime().exec(), to utilize native filesystem hardlink > capability. So it is forking a full-weight external process, and we call it > on each individual file to be replicated. > As a simple check on the possible cost of this approach, I built a Perl test > script (under Linux on a production-class datanode). Perl also uses a > compiled and optimized p-code engine, and it has both native support for > hardlinks and the ability to do "exec". > - A simple script to create 256,000 files in a directory tree organized like > the Datanode, took 10 seconds to run. > - Replicating that directory tree using hardlinks, the same way as the > Datanode, took 12 seconds using native hardlink support. > - The same replication using outcalls to exec, one per file, took 256 > seconds! > - Batching the calls, and doing 'exec' once per directory instead of once > per file, took 16 seconds. > Obviously, your mileage will vary based on the number of blocks per volume. > A volume with less than about 4000 blocks will have only 65 directories. A > volume with more than 4K and less than about 250K blocks will have 4200 > directories (more or less). And there are two files per block (the data file > and the .meta file). So the average number of files per directory may vary > from 2:1 to 500:1. A node with 50K blocks and four volumes will have 25K > files per volume, or an average of about 6:1. So this change may be expected > to take it down from, say, 12 minutes per volume to 2. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-270) DFS Upgrade should process dfs.data.dirs in parallel
[ https://issues.apache.org/jira/browse/HDFS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992645#comment-12992645 ] Hairong Kuang commented on HDFS-270: Matt, thanks so much for sharing the patch to HDFS-1445. I will review it. Cutting the time from 8 min to 8 sec is so impressive! Job well done! > DFS Upgrade should process dfs.data.dirs in parallel > > > Key: HDFS-270 > URL: https://issues.apache.org/jira/browse/HDFS-270 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20.2 >Reporter: Stu Hood >Assignee: Hairong Kuang > > I just upgraded from 0.14.2 to 0.15.0, and things went very smoothly, if a > little slowly. > The main reason the upgrade took so long was the block upgrades on the > datanodes. Each of our datanodes has 3 drives listed for the dfs.data.dir > parameter. From looking at the logs, it is fairly clear that the upgrade > procedure does not attempt to upgrade all listed dfs.data.dir's in parallel. > I think even if all of your dfs.data.dir's are on the same physical device, > there would still be an advantage to performing the upgrade process in > parallel. The less downtime, the better: especially if it is potentially 20 > minutes versus 60 minutes. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HDFS-1617) CLONE to COMMON - Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file
CLONE to COMMON - Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file Key: HDFS-1617 URL: https://issues.apache.org/jira/browse/HDFS-1617 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20.2 Reporter: Matt Foley Assignee: Matt Foley Fix For: 0.22.0 It was a bit of a puzzle why we can do a full scan of a disk in about 30 seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes to do Upgrade replication via hardlinks. It turns out that the org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to Runtime.getRuntime().exec(), to utilize native filesystem hardlink capability. So it is forking a full-weight external process, and we call it on each individual file to be replicated. As a simple check on the possible cost of this approach, I built a Perl test script (under Linux on a production-class datanode). Perl also uses a compiled and optimized p-code engine, and it has both native support for hardlinks and the ability to do "exec". - A simple script to create 256,000 files in a directory tree organized like the Datanode, took 10 seconds to run. - Replicating that directory tree using hardlinks, the same way as the Datanode, took 12 seconds using native hardlink support. - The same replication using outcalls to exec, one per file, took 256 seconds! - Batching the calls, and doing 'exec' once per directory instead of once per file, took 16 seconds. Obviously, your mileage will vary based on the number of blocks per volume. A volume with less than about 4000 blocks will have only 65 directories. A volume with more than 4K and less than about 250K blocks will have 4200 directories (more or less). And there are two files per block (the data file and the .meta file). So the average number of files per directory may vary from 2:1 to 500:1. A node with 50K blocks and four volumes will have 25K files per volume, or an average of about 6:1. So this change may be expected to take it down from, say, 12 minutes per volume to 2. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file
[ https://issues.apache.org/jira/browse/HDFS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992642#comment-12992642 ] Hadoop QA commented on HDFS-1445: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12470696/HDFS-1445-trunk.v22_hdfs_2-of-2.patch against trunk revision 1068968. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/157//testReport/ Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/157//console This message is automatically generated. > Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it > once per directory instead of once per file > -- > > Key: HDFS-1445 > URL: https://issues.apache.org/jira/browse/HDFS-1445 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20.2 >Reporter: Matt Foley >Assignee: Matt Foley > Fix For: 0.22.0 > > Attachments: HDFS-1445-trunk.v22_common_1-of-2.patch, > HDFS-1445-trunk.v22_hdfs_2-of-2.patch > > > It was a bit of a puzzle why we can do a full scan of a disk in about 30 > seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes > to do Upgrade replication via hardlinks. It turns out that the > org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to > Runtime.getRuntime().exec(), to utilize native filesystem hardlink > capability. So it is forking a full-weight external process, and we call it > on each individual file to be replicated. > As a simple check on the possible cost of this approach, I built a Perl test > script (under Linux on a production-class datanode). Perl also uses a > compiled and optimized p-code engine, and it has both native support for > hardlinks and the ability to do "exec". > - A simple script to create 256,000 files in a directory tree organized like > the Datanode, took 10 seconds to run. > - Replicating that directory tree using hardlinks, the same way as the > Datanode, took 12 seconds using native hardlink support. > - The same replication using outcalls to exec, one per file, took 256 > seconds! > - Batching the calls, and doing 'exec' once per directory instead of once > per file, took 16 seconds. > Obviously, your mileage will vary based on the number of blocks per volume. > A volume with less than about 4000 blocks will have only 65 directories. A > volume with more than 4K and less than about 250K blocks will have 4200 > directories (more or less). And there are two files per block (the data file > and the .meta file). So the average number of files per directory may vary > from 2:1 to 500:1. A node with 50K blocks and four volumes will have 25K > files per volume, or an average of about 6:1. So this change may be expected > to take it down from, say, 12 minutes per volume to 2. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-270) DFS Upgrade should process dfs.data.dirs in parallel
[ https://issues.apache.org/jira/browse/HDFS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992639#comment-12992639 ] Matt Foley commented on HDFS-270: - Hello Hairong, sorry I didn't see and respond to your request timely. You are of course welcome to take this on. However, please first take the patch for HDFS-1445, which I have now uploaded to that JIRA. It cuts the per-volume upgrade time from aprx 8 minutes to aprx 8 seconds, for my timings of a 12,500-block (25,000-file) volume. Even 12 volumes won't take very long to upgrade at that rate. Regards, --Matt > DFS Upgrade should process dfs.data.dirs in parallel > > > Key: HDFS-270 > URL: https://issues.apache.org/jira/browse/HDFS-270 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20.2 >Reporter: Stu Hood >Assignee: Hairong Kuang > > I just upgraded from 0.14.2 to 0.15.0, and things went very smoothly, if a > little slowly. > The main reason the upgrade took so long was the block upgrades on the > datanodes. Each of our datanodes has 3 drives listed for the dfs.data.dir > parameter. From looking at the logs, it is fairly clear that the upgrade > procedure does not attempt to upgrade all listed dfs.data.dir's in parallel. > I think even if all of your dfs.data.dir's are on the same physical device, > there would still be an advantage to performing the upgrade process in > parallel. The less downtime, the better: especially if it is potentially 20 > minutes versus 60 minutes. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1418) DFSClient Uses Deprecated "mapred.task.id" Configuration Key Causing Unecessary Warning Messages
[ https://issues.apache.org/jira/browse/HDFS-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992636#comment-12992636 ] Hadoop QA commented on HDFS-1418: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12455481/HDFS-1418.patch against trunk revision 1068968. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/156//console This message is automatically generated. > DFSClient Uses Deprecated "mapred.task.id" Configuration Key Causing > Unecessary Warning Messages > > > Key: HDFS-1418 > URL: https://issues.apache.org/jira/browse/HDFS-1418 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.22.0 >Reporter: Ranjit Mathew >Priority: Minor > Attachments: HDFS-1418.patch > > > Every invocation of the "hadoop fs" command leads to an unnecessary warning > like the following: > {noformat} > $ $HADOOP_HOME/bin/hadoop fs -ls / > 10/09/24 15:10:23 WARN conf.Configuration: mapred.task.id is deprecated. > Instead, use mapreduce.task.attempt.id > {noformat} > This is easily fixed by updating > "src/java/org/apache/hadoop/hdfs/DFSClient.java". -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file
[ https://issues.apache.org/jira/browse/HDFS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-1445: - Attachment: HDFS-1445-trunk.v22_hdfs_2-of-2.patch HDFS-1445-trunk.v22_common_1-of-2.patch This patch requires coordinated change in both COMMON and HDFS. > Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it > once per directory instead of once per file > -- > > Key: HDFS-1445 > URL: https://issues.apache.org/jira/browse/HDFS-1445 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20.2 >Reporter: Matt Foley >Assignee: Matt Foley > Fix For: 0.22.0 > > Attachments: HDFS-1445-trunk.v22_common_1-of-2.patch, > HDFS-1445-trunk.v22_hdfs_2-of-2.patch > > > It was a bit of a puzzle why we can do a full scan of a disk in about 30 > seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes > to do Upgrade replication via hardlinks. It turns out that the > org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to > Runtime.getRuntime().exec(), to utilize native filesystem hardlink > capability. So it is forking a full-weight external process, and we call it > on each individual file to be replicated. > As a simple check on the possible cost of this approach, I built a Perl test > script (under Linux on a production-class datanode). Perl also uses a > compiled and optimized p-code engine, and it has both native support for > hardlinks and the ability to do "exec". > - A simple script to create 256,000 files in a directory tree organized like > the Datanode, took 10 seconds to run. > - Replicating that directory tree using hardlinks, the same way as the > Datanode, took 12 seconds using native hardlink support. > - The same replication using outcalls to exec, one per file, took 256 > seconds! > - Batching the calls, and doing 'exec' once per directory instead of once > per file, took 16 seconds. > Obviously, your mileage will vary based on the number of blocks per volume. > A volume with less than about 4000 blocks will have only 65 directories. A > volume with more than 4K and less than about 250K blocks will have 4200 > directories (more or less). And there are two files per block (the data file > and the .meta file). So the average number of files per directory may vary > from 2:1 to 500:1. A node with 50K blocks and four volumes will have 25K > files per volume, or an average of about 6:1. So this change may be expected > to take it down from, say, 12 minutes per volume to 2. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file
[ https://issues.apache.org/jira/browse/HDFS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-1445: - Fix Version/s: 0.22.0 Release Note: Batch hardlinking during "upgrade" snapshots, cutting time from aprx 8 minutes per volume to aprx 8 seconds. Validated in both Linux and Windows. Requires coordinated change in both COMMON and HDFS. Status: Patch Available (was: Open) Requires coordinated change in both COMMON and HDFS. > Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it > once per directory instead of once per file > -- > > Key: HDFS-1445 > URL: https://issues.apache.org/jira/browse/HDFS-1445 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20.2 >Reporter: Matt Foley >Assignee: Matt Foley > Fix For: 0.22.0 > > > It was a bit of a puzzle why we can do a full scan of a disk in about 30 > seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes > to do Upgrade replication via hardlinks. It turns out that the > org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to > Runtime.getRuntime().exec(), to utilize native filesystem hardlink > capability. So it is forking a full-weight external process, and we call it > on each individual file to be replicated. > As a simple check on the possible cost of this approach, I built a Perl test > script (under Linux on a production-class datanode). Perl also uses a > compiled and optimized p-code engine, and it has both native support for > hardlinks and the ability to do "exec". > - A simple script to create 256,000 files in a directory tree organized like > the Datanode, took 10 seconds to run. > - Replicating that directory tree using hardlinks, the same way as the > Datanode, took 12 seconds using native hardlink support. > - The same replication using outcalls to exec, one per file, took 256 > seconds! > - Batching the calls, and doing 'exec' once per directory instead of once > per file, took 16 seconds. > Obviously, your mileage will vary based on the number of blocks per volume. > A volume with less than about 4000 blocks will have only 65 directories. A > volume with more than 4K and less than about 250K blocks will have 4200 > directories (more or less). And there are two files per block (the data file > and the .meta file). So the average number of files per directory may vary > from 2:1 to 500:1. A node with 50K blocks and four volumes will have 25K > files per volume, or an average of about 6:1. So this change may be expected > to take it down from, say, 12 minutes per volume to 2. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1335) HDFS side of HADOOP-6904: first step towards inter-version communications between dfs client and NameNode
[ https://issues.apache.org/jira/browse/HDFS-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992582#comment-12992582 ] Hudson commented on HDFS-1335: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > HDFS side of HADOOP-6904: first step towards inter-version communications > between dfs client and NameNode > - > > Key: HDFS-1335 > URL: https://issues.apache.org/jira/browse/HDFS-1335 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client, name-node >Affects Versions: 0.22.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.23.0 > > Attachments: hdfsRPC.patch, hdfsRpcVersion.patch > > > The idea is that for getProtocolVersion, NameNode checks if the client and > server versions are compatible if the server version is greater than the > client version. If no, throws a VersionIncompatible exception; otherwise, > returns the server version. > On the dfs client side, when creating a NameNode proxy, catches the > VersionMismatch exception and then checks if the client version and the > server version are compatible if the client version is greater than the > server version. If not compatible, throws exception VersionIncomptible; > otherwise, records the server version and continues. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN
[ https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992583#comment-12992583 ] Hudson commented on HDFS-900: - Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Corrupt replicas are not tracked correctly through block report from DN > --- > > Key: HDFS-900 > URL: https://issues.apache.org/jira/browse/HDFS-900 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Konstantin Shvachko >Priority: Blocker > Fix For: 0.22.0 > > Attachments: log-commented, reportCorruptBlock.patch, > to-reproduce.patch > > > This one is tough to describe, but essentially the following order of events > is seen to occur: > # A client marks one replica of a block to be corrupt by telling the NN about > it > # Replication is then scheduled to make a new replica of this node > # The replication completes, such that there are now 3 good replicas and 1 > corrupt replica > # The DN holding the corrupt replica sends a block report. Rather than > telling this DN to delete the node, the NN instead marks this as a new *good* > replica of the block, and schedules deletion on one of the good replicas. > I don't know if this is a dataloss bug in the case of 1 corrupt replica with > dfs.replication=2, but it seems feasible. I will attach a debug log with some > commentary marked by '>', plus a unit test patch which I can get > to reproduce this behavior reliably. (it's not a proper unit test, just some > edits to an existing one to show it) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-560) Proposed enhancements/tuning to hadoop-hdfs/build.xml
[ https://issues.apache.org/jira/browse/HDFS-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992580#comment-12992580 ] Hudson commented on HDFS-560: - Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Proposed enhancements/tuning to hadoop-hdfs/build.xml > -- > > Key: HDFS-560 > URL: https://issues.apache.org/jira/browse/HDFS-560 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build >Affects Versions: 0.21.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-560.patch > > > sibling list of HADOOP-6206, enhancements to the hdfs build for easier > single-system build/test -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992581#comment-12992581 ] Hudson commented on HDFS-1448: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Create multi-format parser for edits logs file, support binary and XML > formats initially > > > Key: HDFS-1448 > URL: https://issues.apache.org/jira/browse/HDFS-1448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: tools >Affects Versions: 0.22.0 >Reporter: Erik Steffl >Assignee: Erik Steffl > Fix For: 0.23.0 > > Attachments: HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, > HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22-5.patch, > HDFS-1448-0.22.patch, Viewer hierarchy.pdf, editsStored > > > Create multi-format parser for edits logs file, support binary and XML > formats initially. > Parsing should work from any supported format to any other supported format > (e.g. from binary to XML and from XML to binary). > The binary format is the format used by FSEditLog class to read/write edits > file. > Primary reason to develop this tool is to help with troubleshooting, the > binary format is hard to read and edit (for human troubleshooters). > Longer term it could be used to clean up and minimize parsers for fsimage and > edits files. Edits parser OfflineEditsViewer is written in a very similar > fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer > and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This > is subject to change, specifically depending on adoption of avro (which would > completely change how objects are serialized as well as provide ways to > convert files to different formats). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1610) TestClientProtocolWithDelegationToken failing
[ https://issues.apache.org/jira/browse/HDFS-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992579#comment-12992579 ] Hudson commented on HDFS-1610: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > TestClientProtocolWithDelegationToken failing > - > > Key: HDFS-1610 > URL: https://issues.apache.org/jira/browse/HDFS-1610 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-1610.txt > > > Another instance of the same type of failure as MAPREDUCE-2300 (a mock > protocol implementation isn't returning a protocol signature) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992584#comment-12992584 ] Hudson commented on HDFS-1602: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Fix HADOOP-4885 for it is doesn't work as expected. > --- > > Key: HDFS-1602 > URL: https://issues.apache.org/jira/browse/HDFS-1602 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.23.0 >Reporter: Konstantin Boudnik >Assignee: Boris Shkolnik > Attachments: HDFS-1602-1.patch, HDFS-1602.patch > > > NameNode storage restore functionality doesn't work (as HDFS-903 > demonstrated). This needs to be either disabled, or removed, or fixed. This > feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1529) Incorrect handling of interrupts in waitForAckedSeqno can cause deadlock
[ https://issues.apache.org/jira/browse/HDFS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992591#comment-12992591 ] Hudson commented on HDFS-1529: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Incorrect handling of interrupts in waitForAckedSeqno can cause deadlock > > > Key: HDFS-1529 > URL: https://issues.apache.org/jira/browse/HDFS-1529 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.22.0 > > Attachments: Test.java, hdfs-1529.txt, hdfs-1529.txt, hdfs-1529.txt > > > In HDFS-895 the handling of interrupts during hflush/close was changed to > preserve interrupt status. This ends up creating an infinite loop in > waitForAckedSeqno if the waiting thread gets interrupted, since Object.wait() > has a strange semantic that it doesn't give up the lock even momentarily if > the thread is already in interrupted state at the beginning of the call. > We should decide what the correct behavior is here - if a thread is > interrupted while it's calling hflush() or close() should we (a) throw an > exception, perhaps InterruptedIOException (b) ignore, or (c) wait for the > flush to finish but preserve interrupt status on exit? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1601) Pipeline ACKs are sent as lots of tiny TCP packets
[ https://issues.apache.org/jira/browse/HDFS-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992585#comment-12992585 ] Hudson commented on HDFS-1601: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Pipeline ACKs are sent as lots of tiny TCP packets > -- > > Key: HDFS-1601 > URL: https://issues.apache.org/jira/browse/HDFS-1601 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.23.0 > > Attachments: hdfs-1601.txt, hdfs-1601.txt > > > I noticed in an hbase benchmark that the packet counts in my network > monitoring seemed high, so took a short pcap trace and found that each > pipeline ACK was being sent as five packets, the first four of which only > contain one byte. We should buffer these bytes and send the PipelineAck as > one TCP packet. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1607) Fix references to misspelled method name getProtocolSigature
[ https://issues.apache.org/jira/browse/HDFS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992590#comment-12992590 ] Hudson commented on HDFS-1607: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Fix references to misspelled method name getProtocolSigature > > > Key: HDFS-1607 > URL: https://issues.apache.org/jira/browse/HDFS-1607 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Trivial > Attachments: hdfs-1607.txt > > -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1600) editsStored.xml cause release audit warning
[ https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992587#comment-12992587 ] Hudson commented on HDFS-1600: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) HDFS-1600. Fix release audit warnings on trunk. Contributedy by Todd Lipcon > editsStored.xml cause release audit warning > --- > > Key: HDFS-1600 > URL: https://issues.apache.org/jira/browse/HDFS-1600 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, test >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Todd Lipcon > Fix For: 0.23.0 > > Attachments: h1600_20110126.patch, hadoop-1600.txt > > > The file > {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}} > for any new patch. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-863) Potential deadlock in TestOverReplicatedBlocks
[ https://issues.apache.org/jira/browse/HDFS-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992588#comment-12992588 ] Hudson commented on HDFS-863: - Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Potential deadlock in TestOverReplicatedBlocks > -- > > Key: HDFS-863 > URL: https://issues.apache.org/jira/browse/HDFS-863 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Todd Lipcon >Assignee: Ken Goodhope > Fix For: 0.23.0 > > Attachments: HDFS-863.patch, HDFS-863.patch, HDFS-863.patch, > HDFS-863.patch, TestNodeCount.png, cycle.png > > > TestOverReplicatedBlocks.testProcesOverReplicateBlock synchronizes on > namesystem.heartbeats without synchronizing on namesystem first. Other places > in the code synchronize namesystem, then heartbeats. It's probably unlikely > to occur in this test case, but it's a simple fix. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1588) Add dfs.hosts.exclude to DFSConfigKeys and use constant in stead of hardcoded string
[ https://issues.apache.org/jira/browse/HDFS-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992589#comment-12992589 ] Hudson commented on HDFS-1588: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Add dfs.hosts.exclude to DFSConfigKeys and use constant in stead of hardcoded > string > > > Key: HDFS-1588 > URL: https://issues.apache.org/jira/browse/HDFS-1588 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 0.23.0 >Reporter: Erik Steffl >Assignee: Erik Steffl > Fix For: 0.23.0 > > Attachments: HDFS-1588-0.23.patch > > -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1557) Separate Storage from FSImage
[ https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992592#comment-12992592 ] Hudson commented on HDFS-1557: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Separate Storage from FSImage > - > > Key: HDFS-1557 > URL: https://issues.apache.org/jira/browse/HDFS-1557 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 0.21.0 >Reporter: Ivan Kelly >Assignee: Ivan Kelly > Fix For: 0.23.0 > > Attachments: 1557-suggestions.txt, HDFS-1557-branch-0.22.diff, > HDFS-1557-branch-0.22.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, > HDFS-1557-trunk.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, > HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, > HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff > > > FSImage currently derives from Storage and FSEditLog has to call methods > directly on FSImage to access the filesystem. This JIRA is to separate the > Storage class out into NNStorage so that FSEditLog is less dependent on > FSImage. From this point, the other parts of the circular dependency should > be easy to fix. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1591) Fix javac, javadoc, findbugs warnings
[ https://issues.apache.org/jira/browse/HDFS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992594#comment-12992594 ] Hudson commented on HDFS-1591: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Fix javac, javadoc, findbugs warnings > - > > Key: HDFS-1591 > URL: https://issues.apache.org/jira/browse/HDFS-1591 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Po Cheung >Assignee: Po Cheung > Fix For: 0.22.0 > > Attachments: hdfs-1591-trunk.patch > > > Split from HADOOP-6642 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1598) ListPathsServlet excludes .*.crc files
[ https://issues.apache.org/jira/browse/HDFS-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992598#comment-12992598 ] Hudson commented on HDFS-1598: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > ListPathsServlet excludes .*.crc files > -- > > Key: HDFS-1598 > URL: https://issues.apache.org/jira/browse/HDFS-1598 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.2 >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.21.1, 0.22.0, 0.23.0 > > Attachments: h1598_20110126.patch, h1598_20110126_0.20.patch > > > The {{.*.crc}} files are excluded by default. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1597) Batched edit log syncs can reset synctxid throw assertions
[ https://issues.apache.org/jira/browse/HDFS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992597#comment-12992597 ] Hudson commented on HDFS-1597: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Batched edit log syncs can reset synctxid throw assertions > -- > > Key: HDFS-1597 > URL: https://issues.apache.org/jira/browse/HDFS-1597 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 0.22.0 > > Attachments: hdfs-1597.txt, hdfs-1597.txt, hdfs-1597.txt, > illustrate-test-failure.txt > > > The top of FSEditLog.logSync has the following assertion: > {code} > assert editStreams.size() > 0 : "no editlog streams"; > {code} > which should actually come after checking to see if the sync was already > batched in by another thread. > This is related to a second bug in which the same case causes synctxid to be > reset to 0 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1582) Remove auto-generated native build files
[ https://issues.apache.org/jira/browse/HDFS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992599#comment-12992599 ] Hudson commented on HDFS-1582: -- Integrated in Hadoop-Hdfs-trunk-Commit #539 (See [https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/]) > Remove auto-generated native build files > > > Key: HDFS-1582 > URL: https://issues.apache.org/jira/browse/HDFS-1582 > Project: Hadoop HDFS > Issue Type: Improvement > Components: contrib/libhdfs >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik > Fix For: 0.22.0, 0.23.0 > > Attachments: HADOOP-6436.patch, HDFS-1582.diff > > Original Estimate: 24h > Remaining Estimate: 24h > > The repo currently includes the automake and autoconf generated files for the > native build. Per discussion on HADOOP-6421 let's remove them and use the > host's automake and autoconf. We should also do this for libhdfs and > fuse-dfs. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1615) seek() on closed DFS input stream throws NPE
[ https://issues.apache.org/jira/browse/HDFS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992568#comment-12992568 ] Todd Lipcon commented on HDFS-1615: --- It should throw IOE, not NPE :) > seek() on closed DFS input stream throws NPE > > > Key: HDFS-1615 > URL: https://issues.apache.org/jira/browse/HDFS-1615 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > After closing an input stream on DFS, seeking slightly ahead of the last read > will throw an NPE: > java.lang.NullPointerException > at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:749) > at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:42) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1600) editsStored.xml cause release audit warning
[ https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1600: -- Resolution: Fixed Fix Version/s: 0.23.0 Status: Resolved (was: Patch Available) > editsStored.xml cause release audit warning > --- > > Key: HDFS-1600 > URL: https://issues.apache.org/jira/browse/HDFS-1600 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, test >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Todd Lipcon > Fix For: 0.23.0 > > Attachments: h1600_20110126.patch, hadoop-1600.txt > > > The file > {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}} > for any new patch. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992557#comment-12992557 ] Konstantin Boudnik commented on HDFS-1602: -- bq. FWIW, TestBlockRecovery.testErrorReplicas failed (timed out) This JIRA is about TestStorageRestore. Boris, would you like to backport it to 0.22 at least? The ticket needs to be closed. > Fix HADOOP-4885 for it is doesn't work as expected. > --- > > Key: HDFS-1602 > URL: https://issues.apache.org/jira/browse/HDFS-1602 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.23.0 >Reporter: Konstantin Boudnik >Assignee: Boris Shkolnik > Attachments: HDFS-1602-1.patch, HDFS-1602.patch > > > NameNode storage restore functionality doesn't work (as HDFS-903 > demonstrated). This needs to be either disabled, or removed, or fixed. This > feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik reassigned HDFS-1602: Assignee: Boris Shkolnik > Fix HADOOP-4885 for it is doesn't work as expected. > --- > > Key: HDFS-1602 > URL: https://issues.apache.org/jira/browse/HDFS-1602 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.23.0 >Reporter: Konstantin Boudnik >Assignee: Boris Shkolnik > Attachments: HDFS-1602-1.patch, HDFS-1602.patch > > > NameNode storage restore functionality doesn't work (as HDFS-903 > demonstrated). This needs to be either disabled, or removed, or fixed. This > feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1616) hadoop fs -put and -copyFromLocal do not support globs in the source path
[ https://issues.apache.org/jira/browse/HDFS-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992549#comment-12992549 ] Harsh J Chouraria commented on HDFS-1616: - This issue is a HADOOP (Common) one, not HDFS. You can achieve a solution to globbing by Hadoop DFS by using {{FsShell.copy}} via {{hdfs -cp}} with a command like: {{hdfs -cp 'file:/home/test/*.jar' /destination}} This command does a globbing on its own (not shell-driven). Should also work inside Grunt/etc.{{.}} To programmatically call {{fs}} sub-functions, it would be wise use the Hadoop's {{FileSystem}} API directly in your language instead. There are Python bindings for HDFS available, to start with, apart from the Java one provided. Regarding glob differences in Shell/Hadoop, there isn't really a standard available to conform to (please correct me if I'm wrong). A good collection of cases is covered in {{test/o.a.h.fs.TestGlobPaths}}, which IMHO caters to most of globbing requirements (there's also a subset of regular-expression support available). > hadoop fs -put and -copyFromLocal do not support globs in the source path > - > > Key: HDFS-1616 > URL: https://issues.apache.org/jira/browse/HDFS-1616 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.2 > Environment: Cloudera CDH3b3 >Reporter: Jay Hacker >Priority: Minor > > I'd like to be able to use Hadoop globbing with the FsShell -put command, but > it doesn't work: > {noformat} > $ ls > file1 file2 > $ hadoop fs -put '*' . > put: File * does not exist. > {noformat} > This has probably gone unnoticed because your shell usually handles it, but > a) I'd like to be able to call 'hadoop fs' programatically without a shell, > b) it doesn't work in Pig or Grunt, where there is no shell helping you, and > c) Hadoop globbing differs from shell globbing and it would be nice to be > able to use it consistently. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1600) editsStored.xml cause release audit warning
[ https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1600: - Hadoop Flags: [Reviewed] +1 patch looks good. > editsStored.xml cause release audit warning > --- > > Key: HDFS-1600 > URL: https://issues.apache.org/jira/browse/HDFS-1600 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, test >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Todd Lipcon > Attachments: h1600_20110126.patch, hadoop-1600.txt > > > The file > {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}} > for any new patch. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1616) hadoop fs -put and -copyFromLocal do not support globs in the source path
[ https://issues.apache.org/jira/browse/HDFS-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Hacker updated HDFS-1616: - Description: I'd like to be able to use Hadoop globbing with the FsShell -put command, but it doesn't work: {noformat} $ ls file1 file2 $ hadoop fs -put '*' . put: File * does not exist. {noformat} This has probably gone unnoticed because your shell usually handles it, but a) I'd like to be able to call 'hadoop fs' programatically without a shell, b) it doesn't work in Pig or Grunt, where there is no shell helping you, and c) Hadoop globbing differs from shell globbing and it would be nice to be able to use it consistently. was: I'd like to be able to use Hadoop globbing with the FsShell -put command, but it doesn't work: {noformat} $ ls file1 file2 $ hadoop fs -put '*' . put: File * does not exist. {noformat} This has probably gone unnoticed because your shell usually handles it, but a) I'd like to be able to call 'hadoop fs' programatically without a shell, and b) Hadoop globbing differs from shell globbing and it would be nice to be able to use it consistently. > hadoop fs -put and -copyFromLocal do not support globs in the source path > - > > Key: HDFS-1616 > URL: https://issues.apache.org/jira/browse/HDFS-1616 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.2 > Environment: Cloudera CDH3b3 >Reporter: Jay Hacker >Priority: Minor > > I'd like to be able to use Hadoop globbing with the FsShell -put command, but > it doesn't work: > {noformat} > $ ls > file1 file2 > $ hadoop fs -put '*' . > put: File * does not exist. > {noformat} > This has probably gone unnoticed because your shell usually handles it, but > a) I'd like to be able to call 'hadoop fs' programatically without a shell, > b) it doesn't work in Pig or Grunt, where there is no shell helping you, and > c) Hadoop globbing differs from shell globbing and it would be nice to be > able to use it consistently. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HDFS-1616) hadoop fs -put and -copyFromLocal do not support globs in the source path
hadoop fs -put and -copyFromLocal do not support globs in the source path - Key: HDFS-1616 URL: https://issues.apache.org/jira/browse/HDFS-1616 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.2 Environment: Cloudera CDH3b3 Reporter: Jay Hacker Priority: Minor I'd like to be able to use Hadoop globbing with the FsShell -put command, but it doesn't work: {noformat} $ ls file1 file2 $ hadoop fs -put '*' . put: File * does not exist. {noformat} This has probably gone unnoticed because your shell usually handles it, but a) I'd like to be able to call 'hadoop fs' programatically without a shell, and b) Hadoop globbing differs from shell globbing and it would be nice to be able to use it consistently. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1615) seek() on closed DFS input stream throws NPE
[ https://issues.apache.org/jira/browse/HDFS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992478#comment-12992478 ] M. C. Srivas commented on HDFS-1615: >After closing an input stream on DFS, seeking slightly ahead of the last read >will throw an NPE: Isn't this a good thing? It exposes bugs at the layer above DFS. I'd prefer to keep this behaviour rather than fix it. > seek() on closed DFS input stream throws NPE > > > Key: HDFS-1615 > URL: https://issues.apache.org/jira/browse/HDFS-1615 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > After closing an input stream on DFS, seeking slightly ahead of the last read > will throw an NPE: > java.lang.NullPointerException > at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:749) > at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:42) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1600) editsStored.xml cause release audit warning
[ https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992465#comment-12992465 ] Hadoop QA commented on HDFS-1600: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12470668/hadoop-1600.txt against trunk revision 1068725. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/155//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/155//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/155//console This message is automatically generated. > editsStored.xml cause release audit warning > --- > > Key: HDFS-1600 > URL: https://issues.apache.org/jira/browse/HDFS-1600 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, test >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Todd Lipcon > Attachments: h1600_20110126.patch, hadoop-1600.txt > > > The file > {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}} > for any new patch. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1600) editsStored.xml cause release audit warning
[ https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1600: -- Status: Patch Available (was: Open) > editsStored.xml cause release audit warning > --- > > Key: HDFS-1600 > URL: https://issues.apache.org/jira/browse/HDFS-1600 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, test >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Todd Lipcon > Attachments: h1600_20110126.patch, hadoop-1600.txt > > > The file > {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}} > for any new patch. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1600) editsStored.xml cause release audit warning
[ https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1600: -- Attachment: hadoop-1600.txt here's a patch which just updates the excludes for rat > editsStored.xml cause release audit warning > --- > > Key: HDFS-1600 > URL: https://issues.apache.org/jira/browse/HDFS-1600 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, test >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Erik Steffl > Attachments: h1600_20110126.patch, hadoop-1600.txt > > > The file > {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}} > for any new patch. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HDFS-1600) editsStored.xml cause release audit warning
[ https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HDFS-1600: - Assignee: Todd Lipcon (was: Erik Steffl) > editsStored.xml cause release audit warning > --- > > Key: HDFS-1600 > URL: https://issues.apache.org/jira/browse/HDFS-1600 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, test >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Todd Lipcon > Attachments: h1600_20110126.patch, hadoop-1600.txt > > > The file > {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}} > for any new patch. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-560) Proposed enhancements/tuning to hadoop-hdfs/build.xml
[ https://issues.apache.org/jira/browse/HDFS-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-560: Resolution: Fixed Fix Version/s: 0.23.0 Status: Resolved (was: Patch Available) > Proposed enhancements/tuning to hadoop-hdfs/build.xml > -- > > Key: HDFS-560 > URL: https://issues.apache.org/jira/browse/HDFS-560 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build >Affects Versions: 0.21.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-560.patch > > > sibling list of HADOOP-6206, enhancements to the hdfs build for easier > single-system build/test -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1603) Namenode gets sticky if one of namenode storage volumes disappears (removed, unmounted, etc.)
[ https://issues.apache.org/jira/browse/HDFS-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992397#comment-12992397 ] dhruba borthakur commented on HDFS-1603: "During this time none of normal NN operations are possible" how long was this period? > Namenode gets sticky if one of namenode storage volumes disappears (removed, > unmounted, etc.) > - > > Key: HDFS-1603 > URL: https://issues.apache.org/jira/browse/HDFS-1603 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0 >Reporter: Konstantin Boudnik > > While investigating failures on HDFS-1602 it became apparent that once a > namenode storage volume is pulled out NN becomes completely "sticky" until > {{FSImage:processIOError: removing storage}} move the storage from the active > set. During this time none of normal NN operations are possible (e.g. > creating a directory on HDFS timeouts eventually). > In case of NFS this can be workaround'd with soft,intr,timeo,retrans > settings. However, a better handling of the situation is apparently possible > and needs to be implemented. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1595) DFSClient may incorrectly detect datanode failure
[ https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992396#comment-12992396 ] dhruba borthakur commented on HDFS-1595: Error recovery is a pain when a datanode in a write pipeline fails. Sometimes it is truly difficult for the client to accurately determine which datanode failed. Does it make sense to change the algorithm itself: what are the tradeoff's if we say that when the number of datanode in the write-pipeline decreases to min.replication, the client streams data directly to all remaining (or new) datanodes, instead of pipelining? If new datanodes fail, the client will find it easy to determine accurately which datanodes are dead. > DFSClient may incorrectly detect datanode failure > - > > Key: HDFS-1595 > URL: https://issues.apache.org/jira/browse/HDFS-1595 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.20.4 >Reporter: Tsz Wo (Nicholas), SZE >Priority: Critical > Attachments: hdfs-1595-idea.txt > > > Suppose a source datanode S is writing to a destination datanode D in a write > pipeline. We have an implicit assumption that _if S catches an exception > when it is writing to D, then D is faulty and S is fine._ As a result, > DFSClient will take out D from the pipeline, reconstruct the write pipeline > with the remaining datanodes and then continue writing . > However, we find a case that the faulty machine F is indeed S but not D. In > the case we found, F has a faulty network interface (or a faulty switch port) > in such a way that the faulty network interface works fine when transferring > a small amount of data, say 1MB, but it often fails when transferring a large > amount of data, say 100MB. > It is even worst if F is the first datanode in the pipeline. Consider the > following: > # DFSClient creates a pipeline with three datanodes. The first datanode is F. > # F catches an IOException when writing to the second datanode. Then, F > reports the second datanode has error. > # DFSClient removes the second datanode from the pipeline and continue > writing with the remaining datanode(s). > # The pipeline now has two datanodes but (2) and (3) repeat. > # Now, only F remains in the pipeline. DFSClient continues writing with one > replica in F. > # The write succeeds and DFSClient is able to *close the file successfully*. > # The block is under replicated. The NameNode schedules replication from F > to some other datanode D. > # The replication fails for the same reason. D reports to the NameNode that > the replica in F is corrupted. > # The NameNode marks the replica in F is corrupted. > # The block is corrupted since no replica is available. > We were able to manually divide the replicas into small files and copy them > out from F without fixing the hardware. The replicas seems uncorrupted. > This is a *data availability problem*. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1606) Provide a stronger data guarantee in the write pipeline
[ https://issues.apache.org/jira/browse/HDFS-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992392#comment-12992392 ] Tsz Wo (Nicholas), SZE commented on HDFS-1606: -- > In fact, if we can have a system-wide config ... Will do. > Provide a stronger data guarantee in the write pipeline > --- > > Key: HDFS-1606 > URL: https://issues.apache.org/jira/browse/HDFS-1606 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > > In the current design, if there is a datanode/network failure in the write > pipeline, DFSClient will try to remove the failed datanode from the pipeline > and then continue writing with the remaining datanodes. As a result, the > number of datanodes in the pipeline is decreased. Unfortunately, it is > possible that DFSClient may incorrectly remove a healthy datanode but leave > the failed datanode in the pipeline because failure detection may be > inaccurate under erroneous conditions. > We propose to have a new mechanism for adding new datanodes to the pipeline > in order to provide a stronger data guarantee. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1606) Provide a stronger data guarantee in the write pipeline
[ https://issues.apache.org/jira/browse/HDFS-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992390#comment-12992390 ] dhruba borthakur commented on HDFS-1606: In fact, if we can have a system-wide config on whether to trigger this behaviour or not, that will be great. > Provide a stronger data guarantee in the write pipeline > --- > > Key: HDFS-1606 > URL: https://issues.apache.org/jira/browse/HDFS-1606 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > > In the current design, if there is a datanode/network failure in the write > pipeline, DFSClient will try to remove the failed datanode from the pipeline > and then continue writing with the remaining datanodes. As a result, the > number of datanodes in the pipeline is decreased. Unfortunately, it is > possible that DFSClient may incorrectly remove a healthy datanode but leave > the failed datanode in the pipeline because failure detection may be > inaccurate under erroneous conditions. > We propose to have a new mechanism for adding new datanodes to the pipeline > in order to provide a stronger data guarantee. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1612) HDFS Design Documentation is outdated
[ https://issues.apache.org/jira/browse/HDFS-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992386#comment-12992386 ] dhruba borthakur commented on HDFS-1612: This portion of the HDFS document is outdated. would you like to submit a patch that brings it upto date? > HDFS Design Documentation is outdated > - > > Key: HDFS-1612 > URL: https://issues.apache.org/jira/browse/HDFS-1612 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 0.20.2, 0.21.0 > Environment: > http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#The+Persistence+of+File+System+Metadata > http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#The+Persistence+of+File+System+Metadata >Reporter: Joe Crobak >Priority: Minor > > I was trying to discover details about the Secondary NameNode, and came > across the description below in the HDFS design doc. > {quote} > The NameNode keeps an image of the entire file system namespace and file > Blockmap in memory. This key metadata item is designed to be compact, such > that a NameNode with 4 GB of RAM is plenty to support a huge number of files > and directories. When the NameNode starts up, it reads the FsImage and > EditLog from disk, applies all the transactions from the EditLog to the > in-memory representation of the FsImage, and flushes out this new version > into a new FsImage on disk. It can then truncate the old EditLog because its > transactions have been applied to the persistent FsImage. This process is > called a checkpoint. *In the current implementation, a checkpoint only occurs > when the NameNode starts up. Work is in progress to support periodic > checkpointing in the near future.* > {quote} > (emphasis mine). > Note that this directly conflicts with information in the hdfs user guide, > http://hadoop.apache.org/common/docs/r0.20.2/hdfs_user_guide.html#Secondary+NameNode > and > http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node > I haven't done a thorough audit of that doc-- I only noticed the above > inaccuracy. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992376#comment-12992376 ] Nigel Daley commented on HDFS-1602: --- FWIW, TestBlockRecovery.testErrorReplicas failed (timed out). This is in the same class as the fixed test I think. Search console for failure: https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/537/console Re-running build again. > Fix HADOOP-4885 for it is doesn't work as expected. > --- > > Key: HDFS-1602 > URL: https://issues.apache.org/jira/browse/HDFS-1602 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.21.0, 0.23.0 >Reporter: Konstantin Boudnik > Attachments: HDFS-1602-1.patch, HDFS-1602.patch > > > NameNode storage restore functionality doesn't work (as HDFS-903 > demonstrated). This needs to be either disabled, or removed, or fixed. This > feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira