[jira] [Commented] (MAPREDUCE-6597) Distcp should move the path to trash when delete missing path from source
[ https://issues.apache.org/jira/browse/MAPREDUCE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082546#comment-15082546 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-6597: > ... We should add the option skipTrash to control the behavior. if skipTrash > missing, we will move the path to the trash first rather than delete them > directly. This is an incompatible change. We probably should do it the other way -- add a useTrash option. > Distcp should move the path to trash when delete missing path from source > - > > Key: MAPREDUCE-6597 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6597 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Reporter: jeanlyn >Assignee: jeanlyn >Priority: Minor > Attachments: MAPREDUCE-6597.001.patch, MAPREDUCE-6597.002.patch > > > For now, when we use the *distcp* with the delete option, the path will be > deleted when missing in the source. We should add the option *skipTrash* to > control the behavior. if *skipTrash* missing, we will move the path to the > trash first rather than delete them directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6564) distcp creates missing perent directories which is inconsistent with fs -cp
[ https://issues.apache.org/jira/browse/MAPREDUCE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045533#comment-15045533 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-6564: Sure, let's discuss how to fix it. BTW, there is a second problem of distcp - The missing directory created somehow inherits the permission of its parent directory but not using umask. {code} $hadoop fs -ls /dst/ drwx-- - szetszwo hdfs 0 2015-12-04 16:24 /dst/non-existing {code} (The permission will be drwxr-xr-x if it is created using umask.) > distcp creates missing perent directories which is inconsistent with fs -cp > --- > > Key: MAPREDUCE-6564 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6564 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Reporter: Tsz Wo Nicholas Sze > > fs -cp will fail if the destination parent directory does not exist. > {code} > $hadoop fs -cp /a.sh /dst/non-existing/a.sh > cp: `/dst/non-existing/a.sh': No such file or directory > {code} > However, distcp will not fail. It creates the missing parent directory. > {code} > $hadoop distcp /a.sh /dst/non-existing/a.sh > ... > $hadoop fs -ls /dst/non-existing > Found 1 items > -rw-r--r-- 3 szetszwo hdfs531 2015-12-04 16:24 > /dst/non-existing/a.sh > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6564) distcp creates missing perent directories which is inconsistent with fs -cp
[ https://issues.apache.org/jira/browse/MAPREDUCE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-6564: --- Description: fs -cp will fail if the destination parent directory does not exist. {code} $hadoop fs -cp /a.sh /dst/non-existing/a.sh cp: `/dst/non-existing/a.sh': No such file or directory {code} However, distcp will not fail. It creates the missing parent directory. {code} $hadoop distcp /a.sh /dst/non-existing/a.sh ... $hadoop fs -ls /dst/non-existing Found 1 items -rw-r--r-- 3 szetszwo hdfs531 2015-12-04 16:24 /dst/non-existing/a.sh {code} was: fs -cp will fail if the destination parent directory does not exist. {code} $hadoop fs -cp /a.sh /dst/non-existing/a.sh cp: `/dst/non-existing/a.sh': No such file or directory {code} However, distcp will not fail. It creates it {code} $hadoop distcp /a.sh /dst/non-existing/a.sh ... $hadoop fs -ls /dst/non-existing Found 1 items -rw-r--r-- 3 szetszwo hdfs531 2015-12-04 16:24 /dst/non-existing/a.sh {code} > distcp creates missing perent directories which is inconsistent with fs -cp > --- > > Key: MAPREDUCE-6564 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6564 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Reporter: Tsz Wo Nicholas Sze > > fs -cp will fail if the destination parent directory does not exist. > {code} > $hadoop fs -cp /a.sh /dst/non-existing/a.sh > cp: `/dst/non-existing/a.sh': No such file or directory > {code} > However, distcp will not fail. It creates the missing parent directory. > {code} > $hadoop distcp /a.sh /dst/non-existing/a.sh > ... > $hadoop fs -ls /dst/non-existing > Found 1 items > -rw-r--r-- 3 szetszwo hdfs531 2015-12-04 16:24 > /dst/non-existing/a.sh > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6564) distcp creates missing perent directories which is inconsistent with fs -cp
Tsz Wo Nicholas Sze created MAPREDUCE-6564: -- Summary: distcp creates missing perent directories which is inconsistent with fs -cp Key: MAPREDUCE-6564 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6564 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: Tsz Wo Nicholas Sze fs -cp will fail if the destination parent directory does not exist. {code} $hadoop fs -cp /a.sh /dst/non-existing/a.sh cp: `/dst/non-existing/a.sh': No such file or directory {code} However, distcp will not fail. It creates it {code} $hadoop distcp /a.sh /dst/non-existing/a.sh ... $hadoop fs -ls /dst/non-existing Found 1 items -rw-r--r-- 3 szetszwo hdfs531 2015-12-04 16:24 /dst/non-existing/a.sh {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAPREDUCE-5010) use multithreading to speed up mergeParts and try MapPartitionsCompleteEvent to schedule fetch in reduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze reassigned MAPREDUCE-5010: -- Assignee: (was: Tsz Wo Nicholas Sze) > use multithreading to speed up mergeParts and try MapPartitionsCompleteEvent > to schedule fetch in reduce > -- > > Key: MAPREDUCE-5010 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5010 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1 >Affects Versions: 1.0.1 >Reporter: Li Junjun > Attachments: MAPREDUCE-5010.jpg > > > use multithreading to speed up Merger and try MapPartitionsCompleteEvent to > schedule fetch in reduce > This is for muticore cpu, the performance will depend on your hardware and > config. > In maptask > > for (int parts = 0; parts < partitions; parts++) { > //doing merger , append to final output file (file.out) > } > > it only use one thread ! > so,I think :We can use more Theads(conf: mapred.map.mergerthreads) to do > Merger , if you have many cores or cpus. > Before, only a map task complete the reduce tasks will fetch the output , > that means > when map x complete , all the reduce will fetch the output concomitantly. > even we use > >// Randomize the map output locations to prevent >// all reduce-tasks swamping the same tasktracker >List hostList = new ArrayList(); >hostList.addAll(mapLocations.keySet()); >Collections.shuffle(hostList, this.random); > > in reduce task . > for example , 100 reduce wait 2 map complete ,beacase the cluster's map task > capacity is 98,but the job have > 100 map tasks . > so,I think : During the threads mergering , for example if map has 8 > partitions , and use 3 thread doing merger , > where one of the thread complete one part we can inform the Reduce to fetch > the partition file immediately, > or we can wait after 3 parts complete then send the event (conf: > mapred.map.parts.inform) to reduce the jt's stress. > not to wait all the map task complete. by doing this, it will prevent all > reduce-tasks swamping the same tasktracker > more effective and speed reduce process. > is it acceptable ? > and other good ideas ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6123) TestCombineFileInputFormat incorrectly starts 2 MiniDFSCluster instances.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-6123: --- Hadoop Flags: Reviewed +1 patch looks good. > TestCombineFileInputFormat incorrectly starts 2 MiniDFSCluster instances. > - > > Key: MAPREDUCE-6123 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6123 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Trivial > Attachments: MAPREDUCE-6123.1.patch > > > {{TestCombineFileInputFormat#testGetSplitsWithDirectory}} starts 2 > {{MiniDFSCluster}} instances, one right after the other, using the exact same > configuration. There is no need for 2 clusters in this test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6122) TestLineRecordReader may fail due to test data files checked out of git with incorrect line endings.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-6122: --- Hadoop Flags: Reviewed +1 patch looks good. > TestLineRecordReader may fail due to test data files checked out of git with > incorrect line endings. > > > Key: MAPREDUCE-6122 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6122 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Trivial > Attachments: MAPREDUCE-6122.1.patch > > > {{TestLineRecordReader}} uses several test input files at > src/test/resources/*.txt. Some of the tests expect a specific length for the > files, such as dealing with a record that spans multiple splits. If they get > checked out of git with CRLF line endings by mistake, then the test > assertions will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5899) Support incremental data copy in DistCp
[ https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-5899: --- +1 the new patch looks good. > Support incremental data copy in DistCp > --- > > Key: MAPREDUCE-5899 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch, > MAPREDUCE-5899.002.patch > > > Currently when doing distcp with -update option, for two files with the same > file names but with different file length or checksum, we overwrite the whole > file. It will be good if we can detect the case where (sourceFile = > targetFile + appended_data), and only transfer the appended data segment to > the target. This will be very useful if we're doing incremental distcp. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5899) Support incremental data copy in DistCp
[ https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-5899: --- Component/s: distcp Traditionally, distcp is a MapReduce component. Moving this to MapReduce. > Support incremental data copy in DistCp > --- > > Key: MAPREDUCE-5899 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch > > > Currently when doing distcp with -update option, for two files with the same > file names but with different file length or checksum, we overwrite the whole > file. It will be good if we can detect the case where (sourceFile = > targetFile + appended_data), and only transfer the appended data segment to > the target. This will be very useful if we're doing incremental distcp. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (MAPREDUCE-5899) Support incremental data copy in DistCp
[ https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze moved HADOOP-10608 to MAPREDUCE-5899: - Key: MAPREDUCE-5899 (was: HADOOP-10608) Project: Hadoop Map/Reduce (was: Hadoop Common) > Support incremental data copy in DistCp > --- > > Key: MAPREDUCE-5899 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch > > > Currently when doing distcp with -update option, for two files with the same > file names but with different file length or checksum, we overwrite the whole > file. It will be good if we can detect the case where (sourceFile = > targetFile + appended_data), and only transfer the appended data segment to > the target. This will be very useful if we're doing incremental distcp. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5809) Enhance distcp to support preserving HDFS ACLs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-5809: --- Hadoop Flags: Reviewed +1 patch looks good. > Enhance distcp to support preserving HDFS ACLs. > --- > > Key: MAPREDUCE-5809 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 2.4.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, > MAPREDUCE-5809.3.patch, MAPREDUCE-5809.4.patch, MAPREDUCE-5809.5.patch > > > This issue tracks enhancing distcp to add a new command-line argument for > preserving HDFS ACLs from the source at the copy destination. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996487#comment-13996487 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-5081: By hdfs2, do you mean hdfs in branch-2? The DistCpV2 hers is for branch-1. You may try it in your setup. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 1.2.0 > > Attachments: DistCp.java.diff, m5081_20130328.patch, > m5081_20130328b.patch, m5981_20130321.patch, m5981_20130321b.patch, > m5981_20130323.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools > - MAPREDUCE-5014: custom CopyListing -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5809) Enhance distcp to support preserving HDFS ACLs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992522#comment-13992522 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-5809: Thanks for the update. Adding CopyListingFileStatus looks good. One minor comment: - SimpleCopyListing.getFileStatus(..) is no longer needed. It was used to convert FileStatus subclass objects to FileStatus. CopyListingFileStatus does not have subclasses. We may make CopyListingFileStatus final as well. > Enhance distcp to support preserving HDFS ACLs. > --- > > Key: MAPREDUCE-5809 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 2.4.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, > MAPREDUCE-5809.3.patch, MAPREDUCE-5809.4.patch > > > This issue tracks enhancing distcp to add a new command-line argument for > preserving HDFS ACLs from the source at the copy destination. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5809) Enhance distcp to support preserving HDFS ACLs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991301#comment-13991301 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-5809: If we cannot change FileStatus for backwards-compatibility, how about add new FileSystem methods such as listStatusWithACL(..) which returns FileStatusWithACL? > Enhance distcp to support preserving HDFS ACLs. > --- > > Key: MAPREDUCE-5809 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 2.4.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, > MAPREDUCE-5809.3.patch > > > This issue tracks enhancing distcp to add a new command-line argument for > preserving HDFS ACLs from the source at the copy destination. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-5402: --- Resolution: Fixed Fix Version/s: 2.5.0 Status: Resolved (was: Patch Available) I have committed this. Thanks, Tsuyoshi! > DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE > -- > > Key: MAPREDUCE-5402 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp, mrv2 >Reporter: David Rosenstrauch >Assignee: Tsuyoshi OZAWA > Fix For: 2.5.0 > > Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, > MAPREDUCE-5402.3.patch, MAPREDUCE-5402.4-2.patch, MAPREDUCE-5402.4.patch, > MAPREDUCE-5402.5.patch > > > In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author > describes the implementation of DynamicInputFormat, with one of the main > motivations cited being to reduce the chance of long-tails where a few > leftover mappers run much longer than the rest. > However, I today ran into a situation where I experienced exactly such a long > tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate > the problem by overriding the number of mappers and the split ratio used by > the DynamicInputFormat, I was prevented from doing so by the hard-coded limit > set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) > This constant is actually set quite low for production use. (See a > description of my use case below.) And although MAPREDUCE-2765 states that > this is an "overridable maximum", when reading through the code there does > not actually appear to be any mechanism available to override it. > This should be changed. It should be possible to expand the maximum # of > chunks beyond this arbitrary limit. > For example, here is the situation I ran into today: > I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. > The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode > the number of mappers for the job from the default of 20 to 128, so as to > more properly parallelize the copy across the cluster. The number of chunk > files created was calculated as 241, and mapred.num.entries.per.chunk was > calculated as 12. > As the job ran on, it reached a point where there were only 4 remaining map > tasks, which had each been running for over 2 hours. The reason for this was > that each of the 12 files that those mappers were copying were quite large > (several hundred megabytes in size) and took ~20 minutes each. However, > during this time, all the other 124 mappers sat idle. > In theory I should be able to alleviate this problem with DynamicInputFormat. > If I were able to, say, quadruple the number of chunk files created, that > would have made each chunk contain only 3 files, and these large files would > have gotten distributed better around the cluster and copied in parallel. > However, when I tried to do that - by overriding mapred.listing.split.ratio > to, say, 10 - DynamicInputFormat responded with an exception ("Too many > chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease > split-ratio to proceed.") - presumably because I exceeded the > MAX_CHUNKS_TOLERABLE value of 400. > Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I > can't personally see any. > If this limit has no particular logic behind it, then it should be > overridable - or even better: removed altogether. After all, I'm not sure I > see any need for it. Even if numMaps * splitRatio resulted in an > extraordinarily large number, if the code were modified so that the number of > chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then > there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario > where the product of numMaps and splitRatio is large, capping the number of > chunks at the number of files (numberOfChunks = numberOfFiles) would result > in 1 file per chunk - the maximum parallelization possible. That may not be > the best-tuned solution for some users, but I would think that it should be > left up to the user to deal with the potential consequence of not having > tuned their job properly. Certainly that would be better than having an > arbitrary hard-coded limit that *prevents* proper parallelization when > dealing with large files and/or large numbers of mappers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-5402: --- Hadoop Flags: Reviewed +1 the new patch looks good. > DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE > -- > > Key: MAPREDUCE-5402 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp, mrv2 >Reporter: David Rosenstrauch >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, > MAPREDUCE-5402.3.patch, MAPREDUCE-5402.4-2.patch, MAPREDUCE-5402.4.patch, > MAPREDUCE-5402.5.patch > > > In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author > describes the implementation of DynamicInputFormat, with one of the main > motivations cited being to reduce the chance of long-tails where a few > leftover mappers run much longer than the rest. > However, I today ran into a situation where I experienced exactly such a long > tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate > the problem by overriding the number of mappers and the split ratio used by > the DynamicInputFormat, I was prevented from doing so by the hard-coded limit > set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) > This constant is actually set quite low for production use. (See a > description of my use case below.) And although MAPREDUCE-2765 states that > this is an "overridable maximum", when reading through the code there does > not actually appear to be any mechanism available to override it. > This should be changed. It should be possible to expand the maximum # of > chunks beyond this arbitrary limit. > For example, here is the situation I ran into today: > I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. > The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode > the number of mappers for the job from the default of 20 to 128, so as to > more properly parallelize the copy across the cluster. The number of chunk > files created was calculated as 241, and mapred.num.entries.per.chunk was > calculated as 12. > As the job ran on, it reached a point where there were only 4 remaining map > tasks, which had each been running for over 2 hours. The reason for this was > that each of the 12 files that those mappers were copying were quite large > (several hundred megabytes in size) and took ~20 minutes each. However, > during this time, all the other 124 mappers sat idle. > In theory I should be able to alleviate this problem with DynamicInputFormat. > If I were able to, say, quadruple the number of chunk files created, that > would have made each chunk contain only 3 files, and these large files would > have gotten distributed better around the cluster and copied in parallel. > However, when I tried to do that - by overriding mapred.listing.split.ratio > to, say, 10 - DynamicInputFormat responded with an exception ("Too many > chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease > split-ratio to proceed.") - presumably because I exceeded the > MAX_CHUNKS_TOLERABLE value of 400. > Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I > can't personally see any. > If this limit has no particular logic behind it, then it should be > overridable - or even better: removed altogether. After all, I'm not sure I > see any need for it. Even if numMaps * splitRatio resulted in an > extraordinarily large number, if the code were modified so that the number of > chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then > there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario > where the product of numMaps and splitRatio is large, capping the number of > chunks at the number of files (numberOfChunks = numberOfFiles) would result > in 1 file per chunk - the maximum parallelization possible. That may not be > the best-tuned solution for some users, but I would think that it should be > left up to the user to deal with the potential consequence of not having > tuned their job properly. Certainly that would be better than having an > arbitrary hard-coded limit that *prevents* proper parallelization when > dealing with large files and/or large numbers of mappers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5809) Enhance distcp to support preserving HDFS ACLs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990222#comment-13990222 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-5809: > If we do that, then we'll lose the parallelism benefit we get from doing the > RPC calls inside the MR tasks. ... You are right we'll lose the parallelism. However, we have to build the source listing anyway. If FileSystem.listStatus(..) returns also ACL, then we definitely will put the ACL in the listing SequenceFile. (Question, why listStatus(..) does not return ACL or does it make sense to add it in the future?) Now, we need an additional getAclStatus(..) call. If two clusters are close in distance, calling getAclStatus(..) in parallel probably is faster. However, if the clusters are far away (a common case), calling getAclStatus(..) from the destination cluster may take a long round trip time. It also take more bandwidth which is usually limited. Running the distcp command in the source cluster is probably better. > I chose RuntimeException for consistency with the existing exceptions like > CopyListing#DuplicateFileException and CopyListing#InvalidInputException. ... I see. Let's keep extending RuntimeException for the moment. We could change all of them later. > Enhance distcp to support preserving HDFS ACLs. > --- > > Key: MAPREDUCE-5809 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 2.4.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, > MAPREDUCE-5809.3.patch > > > This issue tracks enhancing distcp to add a new command-line argument for > preserving HDFS ACLs from the source at the copy destination. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989242#comment-13989242 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-5402: {code} + public static final String CONF_LABEL_MAX_CHUNKS_TOLERABLE = "distcp.max.chunks.tolerable"; + public static final String CONF_LABEL_MAX_CHUNKS_IDEAL = "distcp.max.chunks.ideal"; + public static final String CONF_LABEL_MIN_RECORDS_PER_CHUNK = "distcp.min.records_per_chunk"; + public static final String CONF_LABEL_SPLIT_RATIO = "distcp.split.ratio"; {code} Since these conf are used only if "-strategy dynamic" is specified, let's use the prefix "distcp.dynamic." for them. The patch looks good other than that. > DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE > -- > > Key: MAPREDUCE-5402 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp, mrv2 >Reporter: David Rosenstrauch >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, > MAPREDUCE-5402.3.patch, MAPREDUCE-5402.4-2.patch, MAPREDUCE-5402.4.patch > > > In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author > describes the implementation of DynamicInputFormat, with one of the main > motivations cited being to reduce the chance of long-tails where a few > leftover mappers run much longer than the rest. > However, I today ran into a situation where I experienced exactly such a long > tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate > the problem by overriding the number of mappers and the split ratio used by > the DynamicInputFormat, I was prevented from doing so by the hard-coded limit > set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) > This constant is actually set quite low for production use. (See a > description of my use case below.) And although MAPREDUCE-2765 states that > this is an "overridable maximum", when reading through the code there does > not actually appear to be any mechanism available to override it. > This should be changed. It should be possible to expand the maximum # of > chunks beyond this arbitrary limit. > For example, here is the situation I ran into today: > I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. > The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode > the number of mappers for the job from the default of 20 to 128, so as to > more properly parallelize the copy across the cluster. The number of chunk > files created was calculated as 241, and mapred.num.entries.per.chunk was > calculated as 12. > As the job ran on, it reached a point where there were only 4 remaining map > tasks, which had each been running for over 2 hours. The reason for this was > that each of the 12 files that those mappers were copying were quite large > (several hundred megabytes in size) and took ~20 minutes each. However, > during this time, all the other 124 mappers sat idle. > In theory I should be able to alleviate this problem with DynamicInputFormat. > If I were able to, say, quadruple the number of chunk files created, that > would have made each chunk contain only 3 files, and these large files would > have gotten distributed better around the cluster and copied in parallel. > However, when I tried to do that - by overriding mapred.listing.split.ratio > to, say, 10 - DynamicInputFormat responded with an exception ("Too many > chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease > split-ratio to proceed.") - presumably because I exceeded the > MAX_CHUNKS_TOLERABLE value of 400. > Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I > can't personally see any. > If this limit has no particular logic behind it, then it should be > overridable - or even better: removed altogether. After all, I'm not sure I > see any need for it. Even if numMaps * splitRatio resulted in an > extraordinarily large number, if the code were modified so that the number of > chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then > there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario > where the product of numMaps and splitRatio is large, capping the number of > chunks at the number of files (numberOfChunks = numberOfFiles) would result > in 1 file per chunk - the maximum parallelization possible. That may not be > the best-tuned solution for some users, but I would think that it should be > left up to the user to deal with the potential consequence of not having > tuned their job properly. Certainly that would be better th
[jira] [Commented] (MAPREDUCE-5809) Enhance distcp to support preserving HDFS ACLs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989235#comment-13989235 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-5809: - CopyCommitter should not get and use source FileSystem since it will be much slower. We should change listing SequenceFile value to something like FileStatusWithACL (a new class). Then, CopyCommitter could read ACL from it. - Should AclsNotSupportedException extend IOException instead of RuntimeException? - Let's move AclsNotSupportedException and DistCpUtils.checkFileSystemAclSupport(..) to Common. They are also useful for other cases. > Enhance distcp to support preserving HDFS ACLs. > --- > > Key: MAPREDUCE-5809 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 2.4.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, > MAPREDUCE-5809.3.patch > > > This issue tracks enhancing distcp to add a new command-line argument for > preserving HDFS ACLs from the source at the copy destination. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984168#comment-13984168 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-5402: - In createSplits(..), should we get min records per chunk from conf? - Similarly, in the new getSplitRatio(..) method, should we get split ratio from conf? - Let's validate the conf values in getMaxChunksTolerable, getMaxChunksIdeal and getMinRecordsPerChunk. > DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE > -- > > Key: MAPREDUCE-5402 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp, mrv2 >Reporter: David Rosenstrauch >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, > MAPREDUCE-5402.3.patch > > > In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author > describes the implementation of DynamicInputFormat, with one of the main > motivations cited being to reduce the chance of long-tails where a few > leftover mappers run much longer than the rest. > However, I today ran into a situation where I experienced exactly such a long > tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate > the problem by overriding the number of mappers and the split ratio used by > the DynamicInputFormat, I was prevented from doing so by the hard-coded limit > set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) > This constant is actually set quite low for production use. (See a > description of my use case below.) And although MAPREDUCE-2765 states that > this is an "overridable maximum", when reading through the code there does > not actually appear to be any mechanism available to override it. > This should be changed. It should be possible to expand the maximum # of > chunks beyond this arbitrary limit. > For example, here is the situation I ran into today: > I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. > The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode > the number of mappers for the job from the default of 20 to 128, so as to > more properly parallelize the copy across the cluster. The number of chunk > files created was calculated as 241, and mapred.num.entries.per.chunk was > calculated as 12. > As the job ran on, it reached a point where there were only 4 remaining map > tasks, which had each been running for over 2 hours. The reason for this was > that each of the 12 files that those mappers were copying were quite large > (several hundred megabytes in size) and took ~20 minutes each. However, > during this time, all the other 124 mappers sat idle. > In theory I should be able to alleviate this problem with DynamicInputFormat. > If I were able to, say, quadruple the number of chunk files created, that > would have made each chunk contain only 3 files, and these large files would > have gotten distributed better around the cluster and copied in parallel. > However, when I tried to do that - by overriding mapred.listing.split.ratio > to, say, 10 - DynamicInputFormat responded with an exception ("Too many > chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease > split-ratio to proceed.") - presumably because I exceeded the > MAX_CHUNKS_TOLERABLE value of 400. > Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I > can't personally see any. > If this limit has no particular logic behind it, then it should be > overridable - or even better: removed altogether. After all, I'm not sure I > see any need for it. Even if numMaps * splitRatio resulted in an > extraordinarily large number, if the code were modified so that the number of > chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then > there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario > where the product of numMaps and splitRatio is large, capping the number of > chunks at the number of files (numberOfChunks = numberOfFiles) would result > in 1 file per chunk - the maximum parallelization possible. That may not be > the best-tuned solution for some users, but I would think that it should be > left up to the user to deal with the potential consequence of not having > tuned their job properly. Certainly that would be better than having an > arbitrary hard-coded limit that *prevents* proper parallelization when > dealing with large files and/or large numbers of mappers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983862#comment-13983862 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-5402: Sure, I should be able to review this later this week. > DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE > -- > > Key: MAPREDUCE-5402 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp, mrv2 >Reporter: David Rosenstrauch >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, > MAPREDUCE-5402.3.patch > > > In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author > describes the implementation of DynamicInputFormat, with one of the main > motivations cited being to reduce the chance of long-tails where a few > leftover mappers run much longer than the rest. > However, I today ran into a situation where I experienced exactly such a long > tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate > the problem by overriding the number of mappers and the split ratio used by > the DynamicInputFormat, I was prevented from doing so by the hard-coded limit > set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) > This constant is actually set quite low for production use. (See a > description of my use case below.) And although MAPREDUCE-2765 states that > this is an "overridable maximum", when reading through the code there does > not actually appear to be any mechanism available to override it. > This should be changed. It should be possible to expand the maximum # of > chunks beyond this arbitrary limit. > For example, here is the situation I ran into today: > I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. > The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode > the number of mappers for the job from the default of 20 to 128, so as to > more properly parallelize the copy across the cluster. The number of chunk > files created was calculated as 241, and mapred.num.entries.per.chunk was > calculated as 12. > As the job ran on, it reached a point where there were only 4 remaining map > tasks, which had each been running for over 2 hours. The reason for this was > that each of the 12 files that those mappers were copying were quite large > (several hundred megabytes in size) and took ~20 minutes each. However, > during this time, all the other 124 mappers sat idle. > In theory I should be able to alleviate this problem with DynamicInputFormat. > If I were able to, say, quadruple the number of chunk files created, that > would have made each chunk contain only 3 files, and these large files would > have gotten distributed better around the cluster and copied in parallel. > However, when I tried to do that - by overriding mapred.listing.split.ratio > to, say, 10 - DynamicInputFormat responded with an exception ("Too many > chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease > split-ratio to proceed.") - presumably because I exceeded the > MAX_CHUNKS_TOLERABLE value of 400. > Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I > can't personally see any. > If this limit has no particular logic behind it, then it should be > overridable - or even better: removed altogether. After all, I'm not sure I > see any need for it. Even if numMaps * splitRatio resulted in an > extraordinarily large number, if the code were modified so that the number of > chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then > there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario > where the product of numMaps and splitRatio is large, capping the number of > chunks at the number of files (numberOfChunks = numberOfFiles) would result > in 1 file per chunk - the maximum parallelization possible. That may not be > the best-tuned solution for some users, but I would think that it should be > left up to the user to deal with the potential consequence of not having > tuned their job properly. Certainly that would be better than having an > arbitrary hard-coded limit that *prevents* proper parallelization when > dealing with large files and/or large numbers of mappers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5830) HostUtil.getTaskLogUrl is not backwards binary compatible with 2.3
[ https://issues.apache.org/jira/browse/MAPREDUCE-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967571#comment-13967571 ] Tsz Wo Nicholas Sze commented on MAPREDUCE-5830: Should we fix it in Hive? > HostUtil.getTaskLogUrl is not backwards binary compatible with 2.3 > -- > > Key: MAPREDUCE-5830 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5830 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Jason Lowe >Priority: Blocker > > HostUtil.getTaskLogUrl used to have a signature like this in Hadoop 2.3.0 and > earlier: > public static String getTaskLogUrl(String taskTrackerHostName, String > httpPort, String taskAttemptID) > but now has a signature like this: > public static String getTaskLogUrl(String scheme, String taskTrackerHostName, > String httpPort, String taskAttemptID) > This breaks source and binary backwards-compatibility. MapReduce and Hive > both have references to this, so their jars compiled against 2.3 or earlier > do not work on 2.4. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-4976) Use the new StringUtils methods added by HADOOP-9252
[ https://issues.apache.org/jira/browse/MAPREDUCE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved MAPREDUCE-4976. Resolution: Not A Problem This is actually not a problem. > Use the new StringUtils methods added by HADOOP-9252 > > > Key: MAPREDUCE-4976 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4976 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > > HADOOP-9252 slightly changed the format of some StringUtils outputs. Some > methods were deprecated by HADOOP-9252. The use of them should be replaced > with the new methods. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5794) SliveMapper always uses default FileSystem.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-5794: --- Attachment: m5794_20140311.patch m5794_20140311.patch: gets path from base dir and removes @SuppressWarnings("deprecation"). > SliveMapper always uses default FileSystem. > --- > > Key: MAPREDUCE-5794 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5794 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: m5794_20140311.patch > > > Similar to MAPREDUCE-5780, SliveMapper should use the test path to get > FileSystem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5794) SliveMapper always uses default FileSystem.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-5794: --- Status: Patch Available (was: Open) > SliveMapper always uses default FileSystem. > --- > > Key: MAPREDUCE-5794 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5794 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: m5794_20140311.patch > > > Similar to MAPREDUCE-5780, SliveMapper should use the test path to get > FileSystem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5794) SliveMapper always uses default FileSystem.
Tsz Wo Nicholas Sze created MAPREDUCE-5794: -- Summary: SliveMapper always uses default FileSystem. Key: MAPREDUCE-5794 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5794 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Similar to MAPREDUCE-5780, SliveMapper should use the test path to get FileSystem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5780) SliveTest always uses default FileSystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-5780: --- Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Arpit for reviewing the patch. I have committed this. > SliveTest always uses default FileSystem > > > Key: MAPREDUCE-5780 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5780 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.4.0 > > Attachments: m5780_20140305.patch > > > It should use the specified path to get FileSystem. Otherwise, it won't work > if the FileSystem is different. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5780) SliveTest always uses default FileSystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-5780: --- Attachment: m5780_20140305.patch m5780_20140305.patch: uses Path.getFileSystem(..). > SliveTest always uses default FileSystem > > > Key: MAPREDUCE-5780 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5780 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: m5780_20140305.patch > > > It should use the specified path to get FileSystem. Otherwise, it won't work > if the FileSystem is different. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5780) SliveTest always uses default FileSystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated MAPREDUCE-5780: --- Status: Patch Available (was: Open) > SliveTest always uses default FileSystem > > > Key: MAPREDUCE-5780 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5780 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: m5780_20140305.patch > > > It should use the specified path to get FileSystem. Otherwise, it won't work > if the FileSystem is different. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5780) SliveTest always uses default FileSystem
Tsz Wo (Nicholas), SZE created MAPREDUCE-5780: - Summary: SliveTest always uses default FileSystem Key: MAPREDUCE-5780 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5780 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor It should use the specified path to get FileSystem. Otherwise, it won't work if the FileSystem is different. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5715) ProcfsBasedProcessTree#constructProcessInfo() can still throw NumberFormatException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867535#comment-13867535 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5715: --- It looks like that the exception was from parsing the utime below. {code} // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) pinfo.updateProcessInfo(m.group(2), m.group(3), Integer.parseInt(m.group(4)), Integer.parseInt(m.group(5)), Long.parseLong(m.group(7)), new BigInteger(m.group(8)), Long.parseLong(m.group(10)), Long.parseLong(m.group(11))); {code} > ProcfsBasedProcessTree#constructProcessInfo() can still throw > NumberFormatException > --- > > Key: MAPREDUCE-5715 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5715 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk, 2.2.0 > Environment: Ubuntu 13.04 (OS Kernel 3.9.0), Armv71Exynos5440 >Reporter: German Florez-Larrahondo >Priority: Minor > Attachments: constructprocessfailing.jpg > > > For long running jobs I have hit an issue that seems to be to be similar to > the bug reported in https://issues.apache.org/jira/browse/MAPREDUCE-3583 > Unfortunately I do not have the OS logs for this issue, but the utime for the > application was read by Hadoop as "184467440737095551615" which does not fit > into a Long. In MAPREDUCE-3583 a change was made to > ProcfsBasedProcessTree.java > in order to support larger values for stime. Perhaps we need to support > larger values for utime (although this could increase the complexity of the > math that is being performed on those numbers) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length
[ https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5604: -- Component/s: (was: client) Hadoop Flags: Reviewed +1 patch looks good. Since this only changes a test, the build failure is obviously unrelated. > TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max > path length > --- > > Key: MAPREDUCE-5604 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 3.0.0, 2.2.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Minor > Attachments: MAPREDUCE-5604.1.patch > > > The test uses the full class name as a component of the > {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}. This > causes container launch to fail when trying to access files at a path longer > than the maximum of 260 characters. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5131) Provide better handling of job status related apis during JT restart
[ https://issues.apache.org/jira/browse/MAPREDUCE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623800#comment-13623800 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5131: --- +1 the new patch looks good. > Provide better handling of job status related apis during JT restart > > > Key: MAPREDUCE-5131 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5131 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Attachments: MAPREDUCE-5131.patch, MAPREDUCE-5131.patch > > > I've seen pig/hive applications bork during JT restart since they get NPEs - > this is due to fact that jobs are not really inited, but are submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5131) Provide better handling of job status related apis during JT restart
[ https://issues.apache.org/jira/browse/MAPREDUCE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623783#comment-13623783 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5131: --- The defaultPolicy is to retry on JobTrackerNotYetInitializedException. For SafeModeException, defaultPolicy is equivalent to TRY_ONCE_THEN_FAIL since SafeModeException will be thrown as a RemoteException. So remoteExceptionToPolicyMap.put(SafeModeException.class, defaultPolicy) is the same as remoteExceptionToPolicyMap.put(SafeModeException.class, TRY_ONCE_THEN_FAIL). I think it is simpler to change RetryUtils.getDefaultRetryPolicy to support multiple exceptions. > Provide better handling of job status related apis during JT restart > > > Key: MAPREDUCE-5131 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5131 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Attachments: MAPREDUCE-5131.patch > > > I've seen pig/hive applications bork during JT restart since they get NPEs - > this is due to fact that jobs are not really inited, but are submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved MAPREDUCE-5081. --- Resolution: Fixed Fix Version/s: 1.2.0 Hadoop Flags: Reviewed Thanks Suresh for reviewing the patch. I have committed this. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 1.2.0 > > Attachments: DistCp.java.diff, m5081_20130328b.patch, > m5081_20130328.patch, m5981_20130321b.patch, m5981_20130321.patch, > m5981_20130323.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools > - MAPREDUCE-5014: custom CopyListing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081: -- Attachment: m5081_20130328b.patch Oops, I accidentally added a new vaidya entry to site.xml. {code} + + {code} m5081_20130328b.patch: removes the new vaidya entry. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: DistCp.java.diff, m5081_20130328b.patch, > m5081_20130328.patch, m5981_20130321b.patch, m5981_20130321.patch, > m5981_20130323.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools > - MAPREDUCE-5014: custom CopyListing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081: -- Attachment: m5081_20130328.patch m5081_20130328.patch: - keeps distcp (version 1) unchanged and adds a new distcp2 command; - also updates the doc. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: DistCp.java.diff, m5081_20130328.patch, > m5981_20130321b.patch, m5981_20130321.patch, m5981_20130323.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools > - MAPREDUCE-5014: custom CopyListing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615978#comment-13615978 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081: --- {quote} > 1. sslConfig.xml, distcp-default.xml is missing Apache license header. Will do. {quote} In branch-1, other xml files like core-default.xml and hdfs-default.xml also do not have license header. So let's don't fix the new xml here and fix them with other xml files together later. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: DistCp.java.diff, m5981_20130321b.patch, > m5981_20130321.patch, m5981_20130323.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools > - MAPREDUCE-5014: custom CopyListing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081: -- Attachment: DistCp.java.diff DistCp.java.diff: diff between branch-1 tools/distcp2/DistCp.java (not tools/DistCp.java) and trunk tools/DistCp.java diff b-1/src/tools/org/apache/hadoop/tools/distcp2/DistCp.java t3/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java > DistCp.java.diff > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: DistCp.java.diff, m5981_20130321b.patch, > m5981_20130321.patch, m5981_20130323.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools > - MAPREDUCE-5014: custom CopyListing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615961#comment-13615961 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081: --- > 1. sslConfig.xml, distcp-default.xml is missing Apache license header. Will do. > 2. There is a lot difference in DistCp.java in trunk and DistCp.java in this > patch? Are you sure that you diff the correct file since there two DistCp.java files? Let me post the diff. > 3. ... Why not leave the old distcp as is and add a new command for distcp2? Sure, I can add a new command for distcp2. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m5981_20130321b.patch, m5981_20130321.patch, > m5981_20130323.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools > - MAPREDUCE-5014: custom CopyListing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible
[ https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611705#comment-13611705 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5014: --- I have combine the branch-1 patch here in MAPREDUCE-5081. > Extending DistCp through a custom CopyListing is not possible > - > > Key: MAPREDUCE-5014 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Fix For: 3.0.0 > > Attachments: m5014_20130322_b-1.patch, m5014_20130322b_b-1.patch, > m5014_20130322b.patch, m5014_20130322.patch, MAPREDUCE-5014.patch, > MAPREDUCE-5014.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > * While it is possible to implement a custom CopyListing in DistCp, DistCp > driver class doesn't allow for using this custom CopyListing. > * Allow SimpleCopyListing to provide an option to exclude files (For instance > it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as > premature copy can indicate that the entire data is available at the > destination) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611553#comment-13611553 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081: --- For m5981_20130323.patch: {noformat} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 41 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 17 new Findbugs (version 1.3.9) warnings. {noformat} > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m5981_20130321b.patch, m5981_20130321.patch, > m5981_20130323.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools > - MAPREDUCE-5014: custom CopyListing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081: -- Attachment: m5981_20130323.patch m5981_20130323.patch: includes MAPREDUCE-5014. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m5981_20130321b.patch, m5981_20130321.patch, > m5981_20130323.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools > - MAPREDUCE-5014: custom CopyListing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081: -- Description: Here is a list of DistCpV2 JIRAs: - MAPREDUCE-2765: DistCpV2 main jira - HADOOP-8703: turn CRC checking off for 0 byte size - HDFS-3054: distcp -skipcrccheck has no effect. - HADOOP-8431: Running distcp without args throws IllegalArgumentException - HADOOP-8775: non-positive value to -bandwidth - MAPREDUCE-4654: TestDistCp is ignored - HADOOP-9022: distcp fails to copy file if -m 0 specified - HADOOP-9025: TestCopyListing failing - MAPREDUCE-5075: DistCp leaks input file handles - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools - MAPREDUCE-5014: custom CopyListing was: Here is a list of DistCpV2 JIRAs: - MAPREDUCE-2765: DistCpV2 main jira - HADOOP-8703: turn CRC checking off for 0 byte size - HDFS-3054: distcp -skipcrccheck has no effect. - HADOOP-8431: Running distcp without args throws IllegalArgumentException - HADOOP-8775: non-positive value to -bandwidth - MAPREDUCE-4654: TestDistCp is ignored - HADOOP-9022: distcp fails to copy file if -m 0 specified - HADOOP-9025: TestCopyListing failing - MAPREDUCE-5075: DistCp leaks input file handles - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk) MAPREDUCE-5014 was committed recently; revised description. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m5981_20130321b.patch, m5981_20130321.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools > - MAPREDUCE-5014: custom CopyListing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible
[ https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610328#comment-13610328 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5014: --- Hi Amareshwari, thanks for checking in the patches! I forgot to mention that the branch-1 patch depends on MAPREDUCE-5081, which backports distcp2 to branch-1. Let's wait for it. > Extending DistCp through a custom CopyListing is not possible > - > > Key: MAPREDUCE-5014 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Fix For: 3.0.0 > > Attachments: m5014_20130322_b-1.patch, m5014_20130322b_b-1.patch, > m5014_20130322b.patch, m5014_20130322.patch, MAPREDUCE-5014.patch, > MAPREDUCE-5014.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > * While it is possible to implement a custom CopyListing in DistCp, DistCp > driver class doesn't allow for using this custom CopyListing. > * Allow SimpleCopyListing to provide an option to exclude files (For instance > it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as > premature copy can indicate that the entire data is available at the > destination) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible
[ https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5014: -- Attachment: m5014_20130322b.patch m5014_20130322b.patch: add timeout for some existing tests. m5014_20130322b_b-1.patch: for branch-1. > Extending DistCp through a custom CopyListing is not possible > - > > Key: MAPREDUCE-5014 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: m5014_20130322_b-1.patch, m5014_20130322b_b-1.patch, > m5014_20130322b.patch, m5014_20130322.patch, MAPREDUCE-5014.patch, > MAPREDUCE-5014.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > * While it is possible to implement a custom CopyListing in DistCp, DistCp > driver class doesn't allow for using this custom CopyListing. > * Allow SimpleCopyListing to provide an option to exclude files (For instance > it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as > premature copy can indicate that the entire data is available at the > destination) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible
[ https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5014: -- Attachment: m5014_20130322b_b-1.patch > Extending DistCp through a custom CopyListing is not possible > - > > Key: MAPREDUCE-5014 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: m5014_20130322_b-1.patch, m5014_20130322b_b-1.patch, > m5014_20130322.patch, MAPREDUCE-5014.patch, MAPREDUCE-5014.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > * While it is possible to implement a custom CopyListing in DistCp, DistCp > driver class doesn't allow for using this custom CopyListing. > * Allow SimpleCopyListing to provide an option to exclude files (For instance > it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as > premature copy can indicate that the entire data is available at the > destination) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible
[ https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609900#comment-13609900 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5014: --- > -1 one of tests included doesn't have a timeout. The test without timeout is an existing test. The test-patch.sh should not -1 on this patch. I think it is a bug in HADOOP-9387. > Extending DistCp through a custom CopyListing is not possible > - > > Key: MAPREDUCE-5014 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: m5014_20130322_b-1.patch, m5014_20130322.patch, > MAPREDUCE-5014.patch, MAPREDUCE-5014.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > * While it is possible to implement a custom CopyListing in DistCp, DistCp > driver class doesn't allow for using this custom CopyListing. > * Allow SimpleCopyListing to provide an option to exclude files (For instance > it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as > premature copy can indicate that the entire data is available at the > destination) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible
[ https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5014: -- Attachment: m5014_20130322_b-1.patch m5014_20130322_b-1.patch: for branch-1. > Extending DistCp through a custom CopyListing is not possible > - > > Key: MAPREDUCE-5014 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: m5014_20130322_b-1.patch, m5014_20130322.patch, > MAPREDUCE-5014.patch, MAPREDUCE-5014.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > * While it is possible to implement a custom CopyListing in DistCp, DistCp > driver class doesn't allow for using this custom CopyListing. > * Allow SimpleCopyListing to provide an option to exclude files (For instance > it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as > premature copy can indicate that the entire data is available at the > destination) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible
[ https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5014: -- Attachment: m5014_20130322.patch Since it is trivial to add timeouts, let me post a patch for it. m5014_20130322.patch: adds timeout and fixes some formatting issues. > Extending DistCp through a custom CopyListing is not possible > - > > Key: MAPREDUCE-5014 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: m5014_20130322.patch, MAPREDUCE-5014.patch, > MAPREDUCE-5014.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > * While it is possible to implement a custom CopyListing in DistCp, DistCp > driver class doesn't allow for using this custom CopyListing. > * Allow SimpleCopyListing to provide an option to exclude files (For instance > it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as > premature copy can indicate that the entire data is available at the > destination) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible
[ https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609835#comment-13609835 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5014: --- Patch looks good. Could you add timeout for the tests in TestIntegration? > Extending DistCp through a custom CopyListing is not possible > - > > Key: MAPREDUCE-5014 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: MAPREDUCE-5014.patch, MAPREDUCE-5014.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > * While it is possible to implement a custom CopyListing in DistCp, DistCp > driver class doesn't allow for using this custom CopyListing. > * Allow SimpleCopyListing to provide an option to exclude files (For instance > it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as > premature copy can indicate that the entire data is available at the > destination) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609238#comment-13609238 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081: --- All tests passed with the patch in my machine. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m5981_20130321b.patch, m5981_20130321.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608846#comment-13608846 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081: --- {noformat} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 41 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 17 new Findbugs (version 1.3.9) warnings. {noformat} The remaining findbugs warnings are not related to this. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m5981_20130321b.patch, m5981_20130321.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081: -- Attachment: m5981_20130321b.patch Two of the findbugs warnings are related. m5981_20130321b.patch: backports distcp part of HADOOP-8341. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m5981_20130321b.patch, m5981_20130321.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608805#comment-13608805 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081: --- {noformat} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 41 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 19 new Findbugs (version 1.3.9) warnings. {noformat} > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m5981_20130321.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081: -- Attachment: m5981_20130321.patch m5981_20130321.patch: backports all JIRAs except MAPREDUCE-5014. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m5981_20130321.patch > > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081: -- Description: Here is a list of DistCpV2 JIRAs: - MAPREDUCE-2765: DistCpV2 main jira - HADOOP-8703: turn CRC checking off for 0 byte size - HDFS-3054: distcp -skipcrccheck has no effect. - HADOOP-8431: Running distcp without args throws IllegalArgumentException - HADOOP-8775: non-positive value to -bandwidth - MAPREDUCE-4654: TestDistCp is ignored - HADOOP-9022: distcp fails to copy file if -m 0 specified - HADOOP-9025: TestCopyListing failing - MAPREDUCE-5075: DistCp leaks input file handles - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk) was: Here is a list of DistCpV2 JIRAs: - MAPREDUCE-2765: DistCpV2 main jira - HADOOP-8703: turn CRC checking off for 0 byte size - HDFS-3054: distcp -skipcrccheck has no effect. - HADOOP-8431: Running distcp without args throws IllegalArgumentException - HADOOP-8775: non-positive value to -bandwidth - MAPREDUCE-4654: TestDistCp is ignored - HADOOP-9022. distcp fails to copy file if -m 0 specified - HADOOP-9025. TestCopyListing failing - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk) Sure, MAPREDUCE-5075 is a useful bug fix. > Backport DistCpV2 and the related JIRAs to branch-1 > --- > > Key: MAPREDUCE-5081 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > > Here is a list of DistCpV2 JIRAs: > - MAPREDUCE-2765: DistCpV2 main jira > - HADOOP-8703: turn CRC checking off for 0 byte size > - HDFS-3054: distcp -skipcrccheck has no effect. > - HADOOP-8431: Running distcp without args throws IllegalArgumentException > - HADOOP-8775: non-positive value to -bandwidth > - MAPREDUCE-4654: TestDistCp is ignored > - HADOOP-9022: distcp fails to copy file if -m 0 specified > - HADOOP-9025: TestCopyListing failing > - MAPREDUCE-5075: DistCp leaks input file handles > - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5075) DistCp leaks input file handles
[ https://issues.apache.org/jira/browse/MAPREDUCE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5075: -- Resolution: Fixed Fix Version/s: 2.0.5-beta Status: Resolved (was: Patch Available) I have committed this. Thanks, Chris! > DistCp leaks input file handles > --- > > Key: MAPREDUCE-5075 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5075 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.0.5-beta > > Attachments: MAPREDUCE-5075.1.patch > > > DistCp wraps the {{InputStream}} for each input file it reads in an instance > of {{ThrottledInputStream}}. This class does not close the wrapped > {{InputStream}}. {{RetriableFileCopyCommand}} guarantees that the > {{ThrottledInputStream}} gets closed, but without closing the underlying > wrapped stream, it still leaks a file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5075) DistCp leaks input file handles
[ https://issues.apache.org/jira/browse/MAPREDUCE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-5075: -- Hadoop Flags: Reviewed +1 patch looks good. > DistCp leaks input file handles > --- > > Key: MAPREDUCE-5075 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5075 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5075.1.patch > > > DistCp wraps the {{InputStream}} for each input file it reads in an instance > of {{ThrottledInputStream}}. This class does not close the wrapped > {{InputStream}}. {{RetriableFileCopyCommand}} guarantees that the > {{ThrottledInputStream}} gets closed, but without closing the underlying > wrapped stream, it still leaks a file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1
Tsz Wo (Nicholas), SZE created MAPREDUCE-5081: - Summary: Backport DistCpV2 and the related JIRAs to branch-1 Key: MAPREDUCE-5081 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081 Project: Hadoop Map/Reduce Issue Type: New Feature Components: distcp Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Here is a list of DistCpV2 JIRAs: - MAPREDUCE-2765: DistCpV2 main jira - HADOOP-8703: turn CRC checking off for 0 byte size - HDFS-3054: distcp -skipcrccheck has no effect. - HADOOP-8431: Running distcp without args throws IllegalArgumentException - HADOOP-8775: non-positive value to -bandwidth - MAPREDUCE-4654: TestDistCp is ignored - HADOOP-9022. distcp fails to copy file if -m 0 specified - HADOOP-9025. TestCopyListing failing - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack
[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584591#comment-13584591 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4502: --- In the [console|https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3351//console], it said "Build step 'Execute shell' marked build as failure". > Multi-level aggregation with combining the result of maps per node/rack > --- > > Key: MAPREDUCE-4502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Affects Versions: 3.0.0 >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: design_v2.pdf, MAPREDUCE-4502.1.patch, > MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, > MAPREDUCE-4525-pof.diff, speculative_draft.pdf > > > The shuffle costs is expensive in Hadoop in spite of the existence of > combiner, because the scope of combining is limited within only one MapTask. > To solve this problem, it's a good way to aggregate the result of maps per > node/rack by launch combiner. > This JIRA is to implement the multi-level aggregation infrastructure, > including combining per container(MAPREDUCE-3902 is related), coordinating > containers by application master without breaking fault tolerance of jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4976) Use the new StringUtils methods added by HADOOP-9252
[ https://issues.apache.org/jira/browse/MAPREDUCE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-4976: -- Description: HADOOP-9252 slightly changed the format of some StringUtils outputs. Some methods were deprecated by HADOOP-9252. The use of them should be replaced with the new methods. (was: HADOOP-9252 slightly changes the format of some StringUtils outputs. It may cause test failures. Also, some methods were deprecated by HADOOP-9252. The use of them should be replaced with the new methods.) Issue Type: Improvement (was: Bug) Summary: Use the new StringUtils methods added by HADOOP-9252 (was: Fix test failure for HADOOP-9252) A recent Jerkins build [build #3309|https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3309/] shows that HADOOP-9252 does not cause test failure in MapReduce. (Revised summary and description.) > Use the new StringUtils methods added by HADOOP-9252 > > > Key: MAPREDUCE-4976 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4976 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > > HADOOP-9252 slightly changed the format of some StringUtils outputs. Some > methods were deprecated by HADOOP-9252. The use of them should be replaced > with the new methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4976) Fix test failure for HADOOP-9252
[ https://issues.apache.org/jira/browse/MAPREDUCE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-4976: -- Description: HADOOP-9252 slightly changes the format of some StringUtils outputs. It may cause test failures. Also, some methods were deprecated by HADOOP-9252. The use of them should be replaced with the new methods. was: HADOOP-9252 slightly changes the format of some StringUtils outputs. It may cause test failures. Also, some methods was deprecated by HADOOP-9252. The use of them should be replaced with the new methods. > Fix test failure for HADOOP-9252 > > > Key: MAPREDUCE-4976 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4976 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > > HADOOP-9252 slightly changes the format of some StringUtils outputs. It may > cause test failures. > Also, some methods were deprecated by HADOOP-9252. The use of them should be > replaced with the new methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4976) Fix test failure for HADOOP-9252
Tsz Wo (Nicholas), SZE created MAPREDUCE-4976: - Summary: Fix test failure for HADOOP-9252 Key: MAPREDUCE-4976 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4976 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor HADOOP-9252 slightly changes the format of some StringUtils outputs. It may cause test failures. Also, some methods was deprecated by HADOOP-9252. The use of them should be replaced with the new methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO
[ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465601#comment-13465601 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4651: --- Oops, I forgot to reload the page. It is great that Jakob has reviewed it. > Benchmarking random reads with DFSIO > > > Key: MAPREDUCE-4651 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: benchmarks, test >Affects Versions: 1.0.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.23.4 > > Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch > > > TestDFSIO measures throughput of HDFS write, read, and append operations. It > will be useful to have an option to use it for benchmarking random reads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO
[ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464794#comment-13464794 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4651: --- Hi Konstantin, the patch no longer applies to trunk. Could you update it? > Benchmarking random reads with DFSIO > > > Key: MAPREDUCE-4651 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: benchmarks, test >Affects Versions: 1.0.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.23.4 > > Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch > > > TestDFSIO measures throughput of HDFS write, read, and append operations. It > will be useful to have an option to use it for benchmarking random reads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4603) Allow JobClient to retry job-submission when JT is in safemode
[ https://issues.apache.org/jira/browse/MAPREDUCE-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462854#comment-13462854 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4603: --- +1 patch looks good. > Allow JobClient to retry job-submission when JT is in safemode > -- > > Key: MAPREDUCE-4603 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4603 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Attachments: MAPREDUCE-4603.patch > > > Similar to HDFS-3504, it would be useful to allow JobClient to retry > job-submission when JT is in safemode (via MAPREDUCE-4328). > This way applications like Pig/Hive don't bork midway when the NN/JT are not > operational. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4309) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/MAPREDUCE-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418031#comment-13418031 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4309: --- Abstract Factory sounds good. For the enum problem, let's use enum for the moment and change it to interfaces later. (Sorry that I was not able to response earlier.) > Make locatlity in YARN's container assignment and task scheduling pluggable > for other deployment topology > - > > Key: MAPREDUCE-4309 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4309 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Junping Du >Assignee: Junping Du > Attachments: > HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, > MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, > MAPREDUCE-4309-v5.patch, MAPREDUCE-4309.patch > > > There are several classes in YARN’s container assignment and task scheduling > algorithms that relate to data locality which were updated to give preference > to running a container on other locality besides node-local and rack-local > (like nodegroup-local). This propose to make these data structure/algorithms > pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class > ScheduledRequests was made a package level class to it would be easier to > create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4309) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/MAPREDUCE-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409267#comment-13409267 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4309: --- - Since JobCounter is public evolving, I think we cannot add NODEGROUP_LOCAL_MAPS. Also, NODEGROUP_LOCAL_MAPS is not yet used. - Similarly, NodeType.NODEGROUP_LOCAL adds node group notation to the original code. Is there anyway to prevent it? I think we may change the enum to an interface but it is a much bigger change. - BTW, YarnConfiguration.NET_TOPOLOGY_WITH_NODEGROUP is not used > Make locatlity in YARN's container assignment and task scheduling pluggable > for other deployment topology > - > > Key: MAPREDUCE-4309 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4309 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Junping Du >Assignee: Junping Du > Attachments: > HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, > MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, > MAPREDUCE-4309.patch > > > There are several classes in YARN’s container assignment and task scheduling > algorithms that relate to data locality which were updated to give preference > to running a container on other locality besides node-local and rack-local > (like nodegroup-local). This propose to make these data structure/algorithms > pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class > ScheduledRequests was made a package level class to it would be easier to > create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4309) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/MAPREDUCE-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407705#comment-13407705 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4309: --- Quick comments: - It does not apply anymore. {noformat} $patch -p0 -i ~/Downloads/HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java Hunk #1 FAILED at 18. Hunk #2 succeeded at 37 (offset 1 line). Hunk #3 FAILED at 45. Hunk #4 succeeded at 65 with fuzz 2 (offset 9 lines). Hunk #5 succeeded at 80 (offset 9 lines). Hunk #6 succeeded at 119 (offset 9 lines). Hunk #7 succeeded at 161 (offset 9 lines). Hunk #8 succeeded at 548 (offset 10 lines). Hunk #9 FAILED at 633. 3 out of 9 hunks FAILED -- saving rejects to file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java.rej ... {noformat} - Could we keep ScheduledRequests as an inner class of RMContainerAllocator, i.e. change it to public/protected/package-private static? It will be easier to see the changes. > Make locatlity in YARN's container assignment and task scheduling pluggable > for other deployment topology > - > > Key: MAPREDUCE-4309 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4309 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Junping Du >Assignee: Junping Du > Attachments: > HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch > > > There are several classes in YARN’s container assignment and task scheduling > algorithms that relate to data locality which were updated to give preference > to running a container on other locality besides node-local and rack-local > (like nodegroup-local). This propose to make these data structure/algorithms > pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class > ScheduledRequests was made a package level class to it would be easier to > create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4310) 4-layer topology (with NodeGroup layer) implementation of Container Assignment and Task Scheduling (for YARN)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406814#comment-13406814 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4310: --- I think we should add a subclass of RackResolver to support NodeGroup. It is similar to what we did for NetworkTopology. > 4-layer topology (with NodeGroup layer) implementation of Container > Assignment and Task Scheduling (for YARN) > - > > Key: MAPREDUCE-4310 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4310 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Junping Du >Assignee: Junping Du > Attachments: > HADOOP-8475-ContainerAssignmentTaskScheduling-withNodeGroup.patch > > > There are several classes in YARN’s container assignment and task scheduling > algorithms that related to data locality which were updated to give > preference to running a container on the same nodegroup. This section > summarized the changes in the patch that provides a new implementation to > support a four-layer hierarchy. > When the ApplicationMaster makes a resource allocation request to the > scheduler of ResourceManager, it will add the node group to the list of > attributes in the ResourceRequest. The parameters of the resource request > will change from to > . > After receiving the ResoureRequest the RM scheduler will assign containers > for requests in the sequence of data-local, nodegroup-local, rack-local and > off-switch.Then, ApplicationMaster schedules tasks on allocated containers in > sequence of data- local, nodegroup-local, rack-local and off-switch. > In terms of code changes made to YARN task scheduling, we updated the class > ContainerRequestEvent so that applications can requests for containers can > include anodegroup. In RM schedulers, FifoScheduler and CapacityScheduler > were updated. For the FifoScheduler, the changes were in the method > assignContainers. For the Capacity Scheduler the method > assignContainersOnNode in the class of LeafQueue was updated. In both changes > a new method, assignNodeGroupLocalContainers() was added in between the > assignment data-local and rack-local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4323) NM leaks sockets
[ https://issues.apache.org/jira/browse/MAPREDUCE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291175#comment-13291175 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4323: --- This looks like a problem of the newly added socket cache. Once it is fixed (say, it is removed for the sake of discussion), are there other problems? > NM leaks sockets > > > Key: MAPREDUCE-4323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.0, 0.24.0, 2.0.0-alpha >Reporter: Daryn Sharp >Priority: Critical > > The NM is exhausting its fds because it's not closing fs instances when the > app is finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278222#comment-13278222 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4266: --- As you mentioned, RAID is WIP. Could you communicate this with MAPREDUCE-3868? > remove Ant remnants from MR > --- > > Key: MAPREDUCE-4266 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: build >Affects Versions: 2.0.1 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Fix For: 2.0.1 > > Attachments: MAPREDUCE-4266.patch > > > Remove: > hadoop-mapreduce-project/src/* > hadoop-mapreduce-project/ivy/* > hadoop-mapreduce-project/build.xml > hadoop-mapreduce-project/ivy.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278144#comment-13278144 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4266: --- Are these filed required for compiling RAID or other contrib projects? > remove Ant remnants from MR > --- > > Key: MAPREDUCE-4266 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: build >Affects Versions: 2.0.1 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Fix For: 2.0.1 > > Attachments: MAPREDUCE-4266.patch > > > Remove: > hadoop-mapreduce-project/src/* > hadoop-mapreduce-project/ivy/* > hadoop-mapreduce-project/build.xml > hadoop-mapreduce-project/ivy.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4231) Update RAID to not to use FSInodeInfo
[ https://issues.apache.org/jira/browse/MAPREDUCE-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-4231: -- Resolution: Fixed Fix Version/s: 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks John for reviewing and testing it! I have committed this. > Update RAID to not to use FSInodeInfo > - > > Key: MAPREDUCE-4231 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4231 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 2.0.0 > > Attachments: m4231_20120507.patch > > > FSInodeInfo was removed by HDFS-3363. We should update RAID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4231) Update RAID to not to use FSInodeInfo
[ https://issues.apache.org/jira/browse/MAPREDUCE-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-4231: -- Attachment: m4231_20120507.patch m4231_20120507.patch: use BlockCollection instead. > Update RAID to not to use FSInodeInfo > - > > Key: MAPREDUCE-4231 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4231 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Tsz Wo (Nicholas), SZE > Attachments: m4231_20120507.patch > > > FSInodeInfo was removed by HDFS-3363. We should update RAID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4231) Update RAID to not to use FSInodeInfo
[ https://issues.apache.org/jira/browse/MAPREDUCE-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-4231: -- Assignee: Tsz Wo (Nicholas), SZE Status: Patch Available (was: Open) > Update RAID to not to use FSInodeInfo > - > > Key: MAPREDUCE-4231 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4231 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m4231_20120507.patch > > > FSInodeInfo was removed by HDFS-3363. We should update RAID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (MAPREDUCE-4231) Update RAID to not to use FSInodeInfo
[ https://issues.apache.org/jira/browse/MAPREDUCE-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE moved HDFS-3380 to MAPREDUCE-4231: - Component/s: (was: contrib/raid) contrib/raid Key: MAPREDUCE-4231 (was: HDFS-3380) Project: Hadoop Map/Reduce (was: Hadoop HDFS) > Update RAID to not to use FSInodeInfo > - > > Key: MAPREDUCE-4231 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4231 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Tsz Wo (Nicholas), SZE > Attachments: m4231_20120507.patch > > > FSInodeInfo was removed by HDFS-3363. We should update RAID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4172) Clean up java warnings in the hadoop-mapreduce-project sub projects
[ https://issues.apache.org/jira/browse/MAPREDUCE-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265598#comment-13265598 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4172: --- In general, I appreciate that you are fixing the warnings. However, please don't suppress them you can't fix them. Please also add more description on what are actually done in individual JIRAs instead of "Clean up Xxx". Thanks. > Clean up java warnings in the hadoop-mapreduce-project sub projects > --- > > Key: MAPREDUCE-4172 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4172 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: build >Affects Versions: trunk >Reporter: Harsh J >Assignee: Harsh J > > There are lots of warnings in the hadoop-mapreduce-project presently. We can > clear almost all of this away: > * Unused imports > * Unused variables > ** For loops that can be replaced with while instead to save an unused > variable > * Unused methods > * Deprecation warnings where an alternative can be used (Especially > SequenceFile reader/writer usage and MiniDFSCluster usage) > * Deprecation warnings where an alternative isn't clear (Especially > MiniMRCluster usage and DistributedCache API usage where a Job object may not > be available) > * Unchecked conversions > * Raw type usage > * (etc.) > I'm going to open one sub-task per sub-project we have, with patches attached > to them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4183) Clean up yarn-common
[ https://issues.apache.org/jira/browse/MAPREDUCE-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265595#comment-13265595 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4183: --- Please do not add @SuppressWarnings(..). Remove them or fixing the warnings are great. > Clean up yarn-common > > > Key: MAPREDUCE-4183 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4183 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: build >Affects Versions: trunk >Reporter: Harsh J >Assignee: Harsh J >Priority: Minor > Attachments: 0011-YARN-Common-Cleanup.patch, > 0011-YARN-Common-Cleanup.patch > > > Clean up a bunch of existing javac warnings in hadoop-yarn-common module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4182) Clean up yarn-applications-distributedshell
[ https://issues.apache.org/jira/browse/MAPREDUCE-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265592#comment-13265592 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4182: --- Typo: "used" should be "unused". > Clean up yarn-applications-distributedshell > --- > > Key: MAPREDUCE-4182 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4182 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: build >Affects Versions: trunk >Reporter: Harsh J >Assignee: Harsh J >Priority: Minor > Attachments: > 0010-YARN-Applications-DistributedShell-Example-Cleanup.patch, > 0010-YARN-Applications-DistributedShell-Example-Cleanup.patch > > > Clean up a bunch of existing javac warnings in > hadoop-yarn-applications-distributedshell module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4181) Clean up yarn-api
[ https://issues.apache.org/jira/browse/MAPREDUCE-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265591#comment-13265591 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4181: --- Could you revise the summary and description for reflecting the actual changes? "Clean up yarn-api" sounds like changing the Yarn APIs. How about "Remove the unused maybeInitBuilder() method from various classes in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/"? > Clean up yarn-api > - > > Key: MAPREDUCE-4181 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4181 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: build >Affects Versions: trunk >Reporter: Harsh J >Assignee: Harsh J >Priority: Minor > Attachments: 0009-YARN-API-Cleanup.patch > > > Clean up a bunch of existing javac warnings in hadoop-yarn-api module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4182) Clean up yarn-applications-distributedshell
[ https://issues.apache.org/jira/browse/MAPREDUCE-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265589#comment-13265589 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4182: --- Since only an used import is removed, I think the summary and the description should be revised to something like "Remove an used import from TestDistributedShell". > Clean up yarn-applications-distributedshell > --- > > Key: MAPREDUCE-4182 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4182 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: build >Affects Versions: trunk >Reporter: Harsh J >Assignee: Harsh J >Priority: Minor > Attachments: > 0010-YARN-Applications-DistributedShell-Example-Cleanup.patch, > 0010-YARN-Applications-DistributedShell-Example-Cleanup.patch > > > Clean up a bunch of existing javac warnings in > hadoop-yarn-applications-distributedshell module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4174) Clean up hadoop-mapreduce-client-common
[ https://issues.apache.org/jira/browse/MAPREDUCE-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265587#comment-13265587 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4174: --- {code} +@SuppressWarnings("deprecation") @Private @Unstable public class MRApps extends Apps { {code} Please do not add @SuppressWarnings in the class header. It basically turns off the warning feature for the entire class. We should fix the warnings but not suppress them. > Clean up hadoop-mapreduce-client-common > --- > > Key: MAPREDUCE-4174 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4174 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: build >Affects Versions: trunk >Reporter: Harsh J >Assignee: Harsh J >Priority: Minor > Attachments: 0002-MR-Client-Common-Cleanup.patch > > > Clean up a bunch of existing javac warnings in hadoop-mapreduce-client-common > module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4176) Clean up hadoop-mapreduce-client-hs
[ https://issues.apache.org/jira/browse/MAPREDUCE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265588#comment-13265588 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4176: --- Please do not add @SuppressWarnings("rawtypes"). > Clean up hadoop-mapreduce-client-hs > --- > > Key: MAPREDUCE-4176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4176 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: build >Affects Versions: trunk >Reporter: Harsh J >Assignee: Harsh J >Priority: Minor > Attachments: 0004-MapReduce-Client-HistoryServer-Cleanup.patch > > > Clean up a bunch of existing javac warnings in hadoop-mapreduce-client-hs > module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4172) Clean up java warnings in the hadoop-mapreduce-project sub projects
[ https://issues.apache.org/jira/browse/MAPREDUCE-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265583#comment-13265583 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4172: --- Hi Harsh, please do not add @SuppressWarnings(..), especially @SuppressWarnings("rawtypes"). We should fix the warnings instead. > Clean up java warnings in the hadoop-mapreduce-project sub projects > --- > > Key: MAPREDUCE-4172 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4172 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: build >Affects Versions: trunk >Reporter: Harsh J >Assignee: Harsh J > > There are lots of warnings in the hadoop-mapreduce-project presently. We can > clear almost all of this away: > * Unused imports > * Unused variables > ** For loops that can be replaced with while instead to save an unused > variable > * Unused methods > * Deprecation warnings where an alternative can be used (Especially > SequenceFile reader/writer usage and MiniDFSCluster usage) > * Deprecation warnings where an alternative isn't clear (Especially > MiniMRCluster usage and DistributedCache API usage where a Job object may not > be available) > * Unchecked conversions > * Raw type usage > * (etc.) > I'm going to open one sub-task per sub-project we have, with patches attached > to them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4074) Client continuously retries to RM When RM goes down before launching Application Master
[ https://issues.apache.org/jira/browse/MAPREDUCE-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-4074: -- Assignee: xieguiming > Client continuously retries to RM When RM goes down before launching > Application Master > --- > > Key: MAPREDUCE-4074 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4074 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.23.1 >Reporter: Devaraj K >Assignee: xieguiming > Fix For: 0.23.3 > > Attachments: MAPREDUCE-4074-1.patch, MAPREDUCE-4074-2.patch, > MAPREDUCE-4074-3.patch, MAPREDUCE-4074.patch > > > Client continuously tries to RM and logs the below messages when the RM goes > down before launching App Master. > I feel exception should be thrown or break the loop after finite no of > retries. > {code:xml} > 28/03/12 07:15:03 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 0 time(s). > 28/03/12 07:15:04 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 1 time(s). > 28/03/12 07:15:05 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 2 time(s). > 28/03/12 07:15:06 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 3 time(s). > 28/03/12 07:15:07 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 4 time(s). > 28/03/12 07:15:08 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 5 time(s). > 28/03/12 07:15:09 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 6 time(s). > 28/03/12 07:15:10 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 7 time(s). > 28/03/12 07:15:11 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 8 time(s). > 28/03/12 07:15:12 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 9 time(s). > 28/03/12 07:15:13 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 0 time(s). > 28/03/12 07:15:14 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 1 time(s). > 28/03/12 07:15:15 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 2 time(s). > 28/03/12 07:15:16 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 3 time(s). > 28/03/12 07:15:17 INFO ipc.Client: Retrying connect to server: > linux-f330.site/10.18.40.182:8032. Already tried 4 time(s). > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4066) To get "yarn.app.mapreduce.am.staging-dir" value, should set the default value
[ https://issues.apache.org/jira/browse/MAPREDUCE-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-4066: -- Assignee: xieguiming > To get "yarn.app.mapreduce.am.staging-dir" value, should set the default value > -- > > Key: MAPREDUCE-4066 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4066 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: job submission, mrv2 >Affects Versions: 0.23.1 > Environment: client is windows eclipse, server is suse >Reporter: xieguiming >Assignee: xieguiming >Priority: Minor > Fix For: 2.0.0 > > Attachments: MAPREDUCE-4066.patch, MAPREDUCE-4066.patch > > > when submit the job use the windows eclipse, and the > yarn.app.mapreduce.am.staging-dir value is null. > {code:title=MRApps.java|borderStyle=solid} > public static Path getStagingAreaDir(Configuration conf, String user) { > return new Path( > conf.get(MRJobConfig.MR_AM_STAGING_DIR) + > Path.SEPARATOR + user + Path.SEPARATOR + STAGING_CONSTANT); > } > {code} > should modify to: > {code:title=MRApps.java|borderStyle=solid} > public static Path getStagingAreaDir(Configuration conf, String user) { > return new Path( > conf.get(MRJobConfig.MR_AM_STAGING_DIR,"/tmp/hadoop-yarn/staging") + > Path.SEPARATOR + user + Path.SEPARATOR + STAGING_CONSTANT); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (MAPREDUCE-4192) the TaskMemoryManager thread is not interrupt when the TaskTracker is oedered to reinit by JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE moved HADOOP-8300 to MAPREDUCE-4192: --- Affects Version/s: (was: 0.20.2) 0.20.2 Key: MAPREDUCE-4192 (was: HADOOP-8300) Project: Hadoop Map/Reduce (was: Hadoop Common) > the TaskMemoryManager thread is not interrupt when the TaskTracker is oedered > to reinit by JobTracker > - > > Key: MAPREDUCE-4192 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4192 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.2 >Reporter: Hua xu > > When the TaskTracker is oedered to reinit by JobTracker, it will interrupt > some threads and then reinit them, but TaskTracker does not interrupt > TaskMemoryManager thread and create a new TaskMemoryManager thread again. I > use the tool--jstack to find that(I reinit TaskTracker 3 times through > JobTracker send TaskTrackerAction.ActionType.REINIT_TRACKER). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3076) TestSleepJob fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-3076: -- Resolution: Fixed Fix Version/s: 0.20.206.0 Status: Resolved (was: Patch Available) I have committed this. Thanks, Arun! > TestSleepJob fails > --- > > Key: MAPREDUCE-3076 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3076 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 0.20.205.0 >Reporter: Arun C Murthy >Assignee: Arun C Murthy >Priority: Blocker > Fix For: 0.20.205.0, 0.20.206.0 > > Attachments: MAPREDUCE-3076.patch > > > TestSleepJob fails, it was intended to be used in other tests for > MAPREDUCE-2981. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3076) TestSleepJob fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-3076: -- Hadoop Flags: [Reviewed] +1 patch looks good. After the patch, TestSleepJob is ignored. {noformat} [junit] Running org.apache.hadoop.mapreduce.TestSleepJob [junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.005 sec {noformat} > TestSleepJob fails > --- > > Key: MAPREDUCE-3076 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3076 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 0.20.205.0 >Reporter: Arun C Murthy >Assignee: Arun C Murthy >Priority: Blocker > Fix For: 0.20.205.0 > > Attachments: MAPREDUCE-3076.patch > > > TestSleepJob fails, it was intended to be used in other tests for > MAPREDUCE-2981. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2981) Backport trunk fairscheduler to 0.20-security branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113014#comment-13113014 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-2981: --- TestSleepJob failed; please see [build #16|https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-0.20.205-Build/16/testReport/org.apache.hadoop.mapreduce/TestSleepJob/initializationError/]. It is not a unit test so that it cannot be initialized by the junit framework. Could you take a look? > Backport trunk fairscheduler to 0.20-security branch > > > Key: MAPREDUCE-2981 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2981 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/fair-share >Affects Versions: 0.20.205.0 >Reporter: Matei Zaharia >Assignee: Matei Zaharia > Fix For: 0.20.205.0 > > Attachments: fairsched-backport-v2.patch, > fairsched-backport-v3.patch, fairsched-backport.patch > > > A lot of improvements have been made to the fair scheduler in 0.21, 0.22 and > trunk, but have not been ported back to the new 0.20.20X releases that are > currently considered the stable branch of Hadoop. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled
[ https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-2711: -- Resolution: Fixed Fix Version/s: 0.24.0 0.23.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Thanks Arun for the review. I have committed this. > TestBlockPlacementPolicyRaid cannot be compiled > --- > > Key: MAPREDUCE-2711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.23.0, 0.24.0 > > Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, > m2711_20110727.patch, m2711_20110818.patch, m2711_20110908.patch > > > {{TestBlockPlacementPolicyRaid}} access internal {{FSNamesystem}} directly. > It cannot be compiled after HDFS-2147. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled
[ https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-2711: -- Description: {{TestBlockPlacementPolicyRaid}} access internal {{FSNamesystem}} directly. It cannot be compiled after HDFS-2147. Environment: (was: {{TestBlockPlacementPolicyRaid}} access internal {{FSNamesystem}} directly. It cannot be compiled after HDFS-2147.) > TestBlockPlacementPolicyRaid cannot be compiled > --- > > Key: MAPREDUCE-2711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, > m2711_20110727.patch, m2711_20110818.patch, m2711_20110908.patch > > > {{TestBlockPlacementPolicyRaid}} access internal {{FSNamesystem}} directly. > It cannot be compiled after HDFS-2147. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled
[ https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-2711: -- Status: Patch Available (was: In Progress) > TestBlockPlacementPolicyRaid cannot be compiled > --- > > Key: MAPREDUCE-2711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid > Environment: {{TestBlockPlacementPolicyRaid}} access internal > {{FSNamesystem}} directly. It cannot be compiled after HDFS-2147. >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, > m2711_20110727.patch, m2711_20110818.patch, m2711_20110908.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled
[ https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-2711: -- Attachment: m2711_20110908.patch m2711_20110908.patch: updated with trunk > TestBlockPlacementPolicyRaid cannot be compiled > --- > > Key: MAPREDUCE-2711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid > Environment: {{TestBlockPlacementPolicyRaid}} access internal > {{FSNamesystem}} directly. It cannot be compiled after HDFS-2147. >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, > m2711_20110727.patch, m2711_20110818.patch, m2711_20110908.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled
[ https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-2711 started by Tsz Wo (Nicholas), SZE. > TestBlockPlacementPolicyRaid cannot be compiled > --- > > Key: MAPREDUCE-2711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid > Environment: {{TestBlockPlacementPolicyRaid}} access internal > {{FSNamesystem}} directly. It cannot be compiled after HDFS-2147. >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, > m2711_20110727.patch, m2711_20110818.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2805) Update RAID for HDFS-2241
[ https://issues.apache.org/jira/browse/MAPREDUCE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved MAPREDUCE-2805. --- Resolution: Fixed Fix Version/s: 0.23.0 Hadoop Flags: [Reviewed] Thanks Suresh for the review. Resolving this. > Update RAID for HDFS-2241 > - > > Key: MAPREDUCE-2805 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2805 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Fix For: 0.23.0 > > Attachments: m2805_20110811.patch > > > {noformat} > src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:44: > interface expected here > [javac] public class RaidBlockSender implements java.io.Closeable, > FSConstants { > [javac]^ > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled
[ https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-2711: -- Attachment: m2711_20110818.patch m2711_20110818.patch: last patch before MAPREDUCE-279 merge. > TestBlockPlacementPolicyRaid cannot be compiled > --- > > Key: MAPREDUCE-2711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid > Environment: {{TestBlockPlacementPolicyRaid}} access internal > {{FSNamesystem}} directly. It cannot be compiled after HDFS-2147. >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, > m2711_20110727.patch, m2711_20110818.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2805) Update RAID for HDFS-2241
[ https://issues.apache.org/jira/browse/MAPREDUCE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082976#comment-13082976 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-2805: --- Checked it manually. I have committed it temporarily. Will wait for a review before resolving this. > Update RAID for HDFS-2241 > - > > Key: MAPREDUCE-2805 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2805 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Reporter: Tsz Wo (Nicholas), SZE > Attachments: m2805_20110811.patch > > > {noformat} > src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:44: > interface expected here > [javac] public class RaidBlockSender implements java.io.Closeable, > FSConstants { > [javac]^ > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira