[jira] [Commented] (MAPREDUCE-6597) Distcp should move the path to trash when delete missing path from source

2016-01-04 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082546#comment-15082546
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-6597:


> ... We should add the option skipTrash to control the behavior. if skipTrash 
> missing, we will move the path to the trash first rather than delete them 
> directly.

This is an incompatible change.  We probably should do it the other way -- add 
a useTrash option.

> Distcp should move the path to trash when delete missing path from source
> -
>
> Key: MAPREDUCE-6597
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6597
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: jeanlyn
>Assignee: jeanlyn
>Priority: Minor
> Attachments: MAPREDUCE-6597.001.patch, MAPREDUCE-6597.002.patch
>
>
> For now, when we use the *distcp* with the delete option, the path will be 
> deleted when missing in the source. We should add the option *skipTrash* to 
> control the behavior. if  *skipTrash* missing, we will move the path to the 
> trash first rather than delete them directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6564) distcp creates missing perent directories which is inconsistent with fs -cp

2015-12-07 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045533#comment-15045533
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-6564:


Sure, let's discuss how to fix it.  BTW, there is a second problem of distcp
- The missing directory created somehow inherits the permission of its parent 
directory but not using umask.
{code}
$hadoop fs -ls /dst/
drwx--   - szetszwo hdfs  0 2015-12-04 16:24 /dst/non-existing
{code}
(The permission will be drwxr-xr-x if it is created using umask.)

> distcp creates missing perent directories which is inconsistent with fs -cp
> ---
>
> Key: MAPREDUCE-6564
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6564
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: Tsz Wo Nicholas Sze
>
> fs -cp will fail if the destination parent directory does not exist.
> {code}
> $hadoop fs -cp /a.sh /dst/non-existing/a.sh
> cp: `/dst/non-existing/a.sh': No such file or directory
> {code}
> However, distcp will not fail.  It creates the missing parent directory.
> {code}
> $hadoop distcp /a.sh /dst/non-existing/a.sh
> ...
> $hadoop fs -ls /dst/non-existing
> Found 1 items
> -rw-r--r--   3 szetszwo hdfs531 2015-12-04 16:24 
> /dst/non-existing/a.sh
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6564) distcp creates missing perent directories which is inconsistent with fs -cp

2015-12-04 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-6564:
---
Description: 
fs -cp will fail if the destination parent directory does not exist.
{code}
$hadoop fs -cp /a.sh /dst/non-existing/a.sh
cp: `/dst/non-existing/a.sh': No such file or directory
{code}
However, distcp will not fail.  It creates the missing parent directory.
{code}
$hadoop distcp /a.sh /dst/non-existing/a.sh
...
$hadoop fs -ls /dst/non-existing
Found 1 items
-rw-r--r--   3 szetszwo hdfs531 2015-12-04 16:24 /dst/non-existing/a.sh
{code}


  was:
fs -cp will fail if the destination parent directory does not exist.
{code}
$hadoop fs -cp /a.sh /dst/non-existing/a.sh
cp: `/dst/non-existing/a.sh': No such file or directory
{code}
However, distcp will not fail.  It creates it
{code}
$hadoop distcp /a.sh /dst/non-existing/a.sh
...
$hadoop fs -ls /dst/non-existing
Found 1 items
-rw-r--r--   3 szetszwo hdfs531 2015-12-04 16:24 /dst/non-existing/a.sh
{code}



> distcp creates missing perent directories which is inconsistent with fs -cp
> ---
>
> Key: MAPREDUCE-6564
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6564
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: Tsz Wo Nicholas Sze
>
> fs -cp will fail if the destination parent directory does not exist.
> {code}
> $hadoop fs -cp /a.sh /dst/non-existing/a.sh
> cp: `/dst/non-existing/a.sh': No such file or directory
> {code}
> However, distcp will not fail.  It creates the missing parent directory.
> {code}
> $hadoop distcp /a.sh /dst/non-existing/a.sh
> ...
> $hadoop fs -ls /dst/non-existing
> Found 1 items
> -rw-r--r--   3 szetszwo hdfs531 2015-12-04 16:24 
> /dst/non-existing/a.sh
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6564) distcp creates missing perent directories which is inconsistent with fs -cp

2015-12-04 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created MAPREDUCE-6564:
--

 Summary: distcp creates missing perent directories which is 
inconsistent with fs -cp
 Key: MAPREDUCE-6564
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6564
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: Tsz Wo Nicholas Sze


fs -cp will fail if the destination parent directory does not exist.
{code}
$hadoop fs -cp /a.sh /dst/non-existing/a.sh
cp: `/dst/non-existing/a.sh': No such file or directory
{code}
However, distcp will not fail.  It creates it
{code}
$hadoop distcp /a.sh /dst/non-existing/a.sh
...
$hadoop fs -ls /dst/non-existing
Found 1 items
-rw-r--r--   3 szetszwo hdfs531 2015-12-04 16:24 /dst/non-existing/a.sh
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAPREDUCE-5010) use multithreading to speed up mergeParts and try MapPartitionsCompleteEvent to schedule fetch in reduce

2014-10-20 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reassigned MAPREDUCE-5010:
--

Assignee: (was: Tsz Wo Nicholas Sze)

> use multithreading to speed up mergeParts  and try MapPartitionsCompleteEvent 
> to schedule fetch in reduce 
> --
>
> Key: MAPREDUCE-5010
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5010
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1
>Affects Versions: 1.0.1
>Reporter: Li Junjun
> Attachments: MAPREDUCE-5010.jpg
>
>
> use multithreading to speed up Merger and try MapPartitionsCompleteEvent to 
> schedule fetch in reduce 
> This is for muticore cpu, the performance will depend on your hardware and 
> config.
> In maptask 
> 
> for (int parts = 0; parts < partitions; parts++) {
>   //doing merger , append to final output file (file.out)
> }
> 
> it only use one thread !
> so,I think :We can use more Theads(conf: mapred.map.mergerthreads) to do 
> Merger , if you have many cores or cpus.
> Before, only a map task complete the reduce tasks will fetch the output , 
> that means 
> when map x complete , all the reduce will fetch the output concomitantly. 
> even we use
>
>// Randomize the map output locations to prevent 
>// all reduce-tasks swamping the same tasktracker
>List hostList = new ArrayList();
>hostList.addAll(mapLocations.keySet());   
>Collections.shuffle(hostList, this.random);
> 
> in  reduce task .
> for example ,  100 reduce wait 2 map complete ,beacase the cluster's map task 
> capacity is 98,but the job have 
> 100 map tasks . 
> so,I think : During the threads mergering  , for example if map has 8 
> partitions , and use 3 thread  doing merger , 
> where one of the thread complete one part we can inform  the Reduce to fetch 
> the partition file  immediately,
> or we can wait after 3 parts complete then send the event  (conf: 
> mapred.map.parts.inform) to reduce the jt's stress.
> not to wait all the map task complete. by doing this, it will  prevent all 
> reduce-tasks swamping the same tasktracker
> more effective and  speed reduce process.
> is it  acceptable ?
> and other good ideas ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6123) TestCombineFileInputFormat incorrectly starts 2 MiniDFSCluster instances.

2014-10-09 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-6123:
---
Hadoop Flags: Reviewed

+1 patch looks good.

> TestCombineFileInputFormat incorrectly starts 2 MiniDFSCluster instances.
> -
>
> Key: MAPREDUCE-6123
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6123
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Trivial
> Attachments: MAPREDUCE-6123.1.patch
>
>
> {{TestCombineFileInputFormat#testGetSplitsWithDirectory}} starts 2 
> {{MiniDFSCluster}} instances, one right after the other, using the exact same 
> configuration.  There is no need for 2 clusters in this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6122) TestLineRecordReader may fail due to test data files checked out of git with incorrect line endings.

2014-10-09 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-6122:
---
Hadoop Flags: Reviewed

+1 patch looks good.

> TestLineRecordReader may fail due to test data files checked out of git with 
> incorrect line endings.
> 
>
> Key: MAPREDUCE-6122
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6122
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Trivial
> Attachments: MAPREDUCE-6122.1.patch
>
>
> {{TestLineRecordReader}} uses several test input files at 
> src/test/resources/*.txt.  Some of the tests expect a specific length for the 
> files, such as dealing with a record that spans multiple splits.  If they get 
> checked out of git with CRLF line endings by mistake, then the test 
> assertions will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5899) Support incremental data copy in DistCp

2014-05-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-5899:
---


+1 the new patch looks good.

> Support incremental data copy in DistCp
> ---
>
> Key: MAPREDUCE-5899
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch, 
> MAPREDUCE-5899.002.patch
>
>
> Currently when doing distcp with -update option, for two files with the same 
> file names but with different file length or checksum, we overwrite the whole 
> file. It will be good if we can detect the case where (sourceFile = 
> targetFile + appended_data), and only transfer the appended data segment to 
> the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5899) Support incremental data copy in DistCp

2014-05-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-5899:
---

Component/s: distcp

Traditionally, distcp is a MapReduce component.  Moving this to MapReduce.

> Support incremental data copy in DistCp
> ---
>
> Key: MAPREDUCE-5899
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch
>
>
> Currently when doing distcp with -update option, for two files with the same 
> file names but with different file length or checksum, we overwrite the whole 
> file. It will be good if we can detect the case where (sourceFile = 
> targetFile + appended_data), and only transfer the appended data segment to 
> the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (MAPREDUCE-5899) Support incremental data copy in DistCp

2014-05-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze moved HADOOP-10608 to MAPREDUCE-5899:
-

Key: MAPREDUCE-5899  (was: HADOOP-10608)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Support incremental data copy in DistCp
> ---
>
> Key: MAPREDUCE-5899
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch
>
>
> Currently when doing distcp with -update option, for two files with the same 
> file names but with different file length or checksum, we overwrite the whole 
> file. It will be good if we can detect the case where (sourceFile = 
> targetFile + appended_data), and only transfer the appended data segment to 
> the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5809) Enhance distcp to support preserving HDFS ACLs.

2014-05-16 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-5809:
---

Hadoop Flags: Reviewed

+1 patch looks good.

> Enhance distcp to support preserving HDFS ACLs.
> ---
>
> Key: MAPREDUCE-5809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.4.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, 
> MAPREDUCE-5809.3.patch, MAPREDUCE-5809.4.patch, MAPREDUCE-5809.5.patch
>
>
> This issue tracks enhancing distcp to add a new command-line argument for 
> preserving HDFS ACLs from the source at the copy destination.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2014-05-13 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996487#comment-13996487
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-5081:


By hdfs2, do you mean hdfs in branch-2?  The DistCpV2 hers is for branch-1.  
You may try it in your setup.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 1.2.0
>
> Attachments: DistCp.java.diff, m5081_20130328.patch, 
> m5081_20130328b.patch, m5981_20130321.patch, m5981_20130321b.patch, 
> m5981_20130323.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
> - MAPREDUCE-5014: custom CopyListing



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5809) Enhance distcp to support preserving HDFS ACLs.

2014-05-11 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992522#comment-13992522
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-5809:


Thanks for the update.  Adding CopyListingFileStatus looks good.  One minor 
comment:
- SimpleCopyListing.getFileStatus(..) is no longer needed.  It was used to 
convert FileStatus subclass objects to FileStatus.  CopyListingFileStatus does 
not have subclasses.  We may make CopyListingFileStatus final as well.


> Enhance distcp to support preserving HDFS ACLs.
> ---
>
> Key: MAPREDUCE-5809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.4.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, 
> MAPREDUCE-5809.3.patch, MAPREDUCE-5809.4.patch
>
>
> This issue tracks enhancing distcp to add a new command-line argument for 
> preserving HDFS ACLs from the source at the copy destination.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5809) Enhance distcp to support preserving HDFS ACLs.

2014-05-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991301#comment-13991301
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-5809:


If we cannot change FileStatus for backwards-compatibility, how about add new 
FileSystem methods such as listStatusWithACL(..) which returns 
FileStatusWithACL?

> Enhance distcp to support preserving HDFS ACLs.
> ---
>
> Key: MAPREDUCE-5809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.4.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, 
> MAPREDUCE-5809.3.patch
>
>
> This issue tracks enhancing distcp to add a new command-line argument for 
> preserving HDFS ACLs from the source at the copy destination.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-05-06 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-5402:
---

   Resolution: Fixed
Fix Version/s: 2.5.0
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Tsuyoshi!

> DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
> --
>
> Key: MAPREDUCE-5402
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp, mrv2
>Reporter: David Rosenstrauch
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.5.0
>
> Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
> MAPREDUCE-5402.3.patch, MAPREDUCE-5402.4-2.patch, MAPREDUCE-5402.4.patch, 
> MAPREDUCE-5402.5.patch
>
>
> In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
> describes the implementation of DynamicInputFormat, with one of the main 
> motivations cited being to reduce the chance of long-tails where a few 
> leftover mappers run much longer than the rest.
> However, I today ran into a situation where I experienced exactly such a long 
> tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
> the problem by overriding the number of mappers and the split ratio used by 
> the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
> set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
> This constant is actually set quite low for production use.  (See a 
> description of my use case below.)  And although MAPREDUCE-2765 states that 
> this is an "overridable maximum", when reading through the code there does 
> not actually appear to be any mechanism available to override it.
> This should be changed.  It should be possible to expand the maximum # of 
> chunks beyond this arbitrary limit.
> For example, here is the situation I ran into today:
> I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
> The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
> the number of mappers for the job from the default of 20 to 128, so as to 
> more properly parallelize the copy across the cluster.  The number of chunk 
> files created was calculated as 241, and mapred.num.entries.per.chunk was 
> calculated as 12.
> As the job ran on, it reached a point where there were only 4 remaining map 
> tasks, which had each been running for over 2 hours.  The reason for this was 
> that each of the 12 files that those mappers were copying were quite large 
> (several hundred megabytes in size) and took ~20 minutes each.  However, 
> during this time, all the other 124 mappers sat idle.
> In theory I should be able to alleviate this problem with DynamicInputFormat. 
>  If I were able to, say, quadruple the number of chunk files created, that 
> would have made each chunk contain only 3 files, and these large files would 
> have gotten distributed better around the cluster and copied in parallel.
> However, when I tried to do that - by overriding mapred.listing.split.ratio 
> to, say, 10 - DynamicInputFormat responded with an exception ("Too many 
> chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
> split-ratio to proceed.") - presumably because I exceeded the 
> MAX_CHUNKS_TOLERABLE value of 400.
> Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
> can't personally see any.
> If this limit has no particular logic behind it, then it should be 
> overridable - or even better:  removed altogether.  After all, I'm not sure I 
> see any need for it.  Even if numMaps * splitRatio resulted in an 
> extraordinarily large number, if the code were modified so that the number of 
> chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
> there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
> where the product of numMaps and splitRatio is large, capping the number of 
> chunks at the number of files (numberOfChunks = numberOfFiles) would result 
> in 1 file per chunk - the maximum parallelization possible.  That may not be 
> the best-tuned solution for some users, but I would think that it should be 
> left up to the user to deal with the potential consequence of not having 
> tuned their job properly.  Certainly that would be better than having an 
> arbitrary hard-coded limit that *prevents* proper parallelization when 
> dealing with large files and/or large numbers of mappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-05-05 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-5402:
---

Hadoop Flags: Reviewed

+1 the new patch looks good.

> DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
> --
>
> Key: MAPREDUCE-5402
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp, mrv2
>Reporter: David Rosenstrauch
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
> MAPREDUCE-5402.3.patch, MAPREDUCE-5402.4-2.patch, MAPREDUCE-5402.4.patch, 
> MAPREDUCE-5402.5.patch
>
>
> In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
> describes the implementation of DynamicInputFormat, with one of the main 
> motivations cited being to reduce the chance of long-tails where a few 
> leftover mappers run much longer than the rest.
> However, I today ran into a situation where I experienced exactly such a long 
> tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
> the problem by overriding the number of mappers and the split ratio used by 
> the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
> set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
> This constant is actually set quite low for production use.  (See a 
> description of my use case below.)  And although MAPREDUCE-2765 states that 
> this is an "overridable maximum", when reading through the code there does 
> not actually appear to be any mechanism available to override it.
> This should be changed.  It should be possible to expand the maximum # of 
> chunks beyond this arbitrary limit.
> For example, here is the situation I ran into today:
> I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
> The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
> the number of mappers for the job from the default of 20 to 128, so as to 
> more properly parallelize the copy across the cluster.  The number of chunk 
> files created was calculated as 241, and mapred.num.entries.per.chunk was 
> calculated as 12.
> As the job ran on, it reached a point where there were only 4 remaining map 
> tasks, which had each been running for over 2 hours.  The reason for this was 
> that each of the 12 files that those mappers were copying were quite large 
> (several hundred megabytes in size) and took ~20 minutes each.  However, 
> during this time, all the other 124 mappers sat idle.
> In theory I should be able to alleviate this problem with DynamicInputFormat. 
>  If I were able to, say, quadruple the number of chunk files created, that 
> would have made each chunk contain only 3 files, and these large files would 
> have gotten distributed better around the cluster and copied in parallel.
> However, when I tried to do that - by overriding mapred.listing.split.ratio 
> to, say, 10 - DynamicInputFormat responded with an exception ("Too many 
> chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
> split-ratio to proceed.") - presumably because I exceeded the 
> MAX_CHUNKS_TOLERABLE value of 400.
> Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
> can't personally see any.
> If this limit has no particular logic behind it, then it should be 
> overridable - or even better:  removed altogether.  After all, I'm not sure I 
> see any need for it.  Even if numMaps * splitRatio resulted in an 
> extraordinarily large number, if the code were modified so that the number of 
> chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
> there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
> where the product of numMaps and splitRatio is large, capping the number of 
> chunks at the number of files (numberOfChunks = numberOfFiles) would result 
> in 1 file per chunk - the maximum parallelization possible.  That may not be 
> the best-tuned solution for some users, but I would think that it should be 
> left up to the user to deal with the potential consequence of not having 
> tuned their job properly.  Certainly that would be better than having an 
> arbitrary hard-coded limit that *prevents* proper parallelization when 
> dealing with large files and/or large numbers of mappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5809) Enhance distcp to support preserving HDFS ACLs.

2014-05-05 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990222#comment-13990222
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-5809:


> If we do that, then we'll lose the parallelism benefit we get from doing the 
> RPC calls inside the MR tasks. ...

You are right we'll lose the parallelism.  However, we have to build the source 
listing anyway.  If FileSystem.listStatus(..) returns also ACL, then we 
definitely will put the ACL in the listing SequenceFile.  (Question, why 
listStatus(..) does not return ACL or does it make sense to add it in the 
future?)  Now, we need an additional getAclStatus(..) call.

If two clusters are close in distance, calling getAclStatus(..) in parallel 
probably is faster.  However, if the clusters are far away (a common case), 
calling getAclStatus(..) from the destination cluster may take a long round 
trip time.  It also take more bandwidth which is usually limited.  Running the 
distcp command in the source cluster is probably better.

> I chose RuntimeException for consistency with the existing exceptions like 
> CopyListing#DuplicateFileException and CopyListing#InvalidInputException. ...

I see. Let's keep extending RuntimeException for the moment.  We could change 
all of them later.

> Enhance distcp to support preserving HDFS ACLs.
> ---
>
> Key: MAPREDUCE-5809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.4.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, 
> MAPREDUCE-5809.3.patch
>
>
> This issue tracks enhancing distcp to add a new command-line argument for 
> preserving HDFS ACLs from the source at the copy destination.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-05-04 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989242#comment-13989242
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-5402:


{code}
+  public static final String CONF_LABEL_MAX_CHUNKS_TOLERABLE = 
"distcp.max.chunks.tolerable";
+  public static final String CONF_LABEL_MAX_CHUNKS_IDEAL = 
"distcp.max.chunks.ideal";
+  public static final String CONF_LABEL_MIN_RECORDS_PER_CHUNK = 
"distcp.min.records_per_chunk";
+  public static final String CONF_LABEL_SPLIT_RATIO = "distcp.split.ratio";
{code}
Since these conf are used only if "-strategy dynamic" is specified, let's use 
the prefix "distcp.dynamic." for them.  The patch looks good other than that.

> DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
> --
>
> Key: MAPREDUCE-5402
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp, mrv2
>Reporter: David Rosenstrauch
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
> MAPREDUCE-5402.3.patch, MAPREDUCE-5402.4-2.patch, MAPREDUCE-5402.4.patch
>
>
> In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
> describes the implementation of DynamicInputFormat, with one of the main 
> motivations cited being to reduce the chance of long-tails where a few 
> leftover mappers run much longer than the rest.
> However, I today ran into a situation where I experienced exactly such a long 
> tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
> the problem by overriding the number of mappers and the split ratio used by 
> the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
> set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
> This constant is actually set quite low for production use.  (See a 
> description of my use case below.)  And although MAPREDUCE-2765 states that 
> this is an "overridable maximum", when reading through the code there does 
> not actually appear to be any mechanism available to override it.
> This should be changed.  It should be possible to expand the maximum # of 
> chunks beyond this arbitrary limit.
> For example, here is the situation I ran into today:
> I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
> The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
> the number of mappers for the job from the default of 20 to 128, so as to 
> more properly parallelize the copy across the cluster.  The number of chunk 
> files created was calculated as 241, and mapred.num.entries.per.chunk was 
> calculated as 12.
> As the job ran on, it reached a point where there were only 4 remaining map 
> tasks, which had each been running for over 2 hours.  The reason for this was 
> that each of the 12 files that those mappers were copying were quite large 
> (several hundred megabytes in size) and took ~20 minutes each.  However, 
> during this time, all the other 124 mappers sat idle.
> In theory I should be able to alleviate this problem with DynamicInputFormat. 
>  If I were able to, say, quadruple the number of chunk files created, that 
> would have made each chunk contain only 3 files, and these large files would 
> have gotten distributed better around the cluster and copied in parallel.
> However, when I tried to do that - by overriding mapred.listing.split.ratio 
> to, say, 10 - DynamicInputFormat responded with an exception ("Too many 
> chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
> split-ratio to proceed.") - presumably because I exceeded the 
> MAX_CHUNKS_TOLERABLE value of 400.
> Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
> can't personally see any.
> If this limit has no particular logic behind it, then it should be 
> overridable - or even better:  removed altogether.  After all, I'm not sure I 
> see any need for it.  Even if numMaps * splitRatio resulted in an 
> extraordinarily large number, if the code were modified so that the number of 
> chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
> there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
> where the product of numMaps and splitRatio is large, capping the number of 
> chunks at the number of files (numberOfChunks = numberOfFiles) would result 
> in 1 file per chunk - the maximum parallelization possible.  That may not be 
> the best-tuned solution for some users, but I would think that it should be 
> left up to the user to deal with the potential consequence of not having 
> tuned their job properly.  Certainly that would be better th

[jira] [Commented] (MAPREDUCE-5809) Enhance distcp to support preserving HDFS ACLs.

2014-05-04 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989235#comment-13989235
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-5809:


- CopyCommitter should not get and use source FileSystem since it will be much 
slower.  We should change listing SequenceFile value to something like 
FileStatusWithACL (a new class).  Then, CopyCommitter could read ACL from it.

- Should AclsNotSupportedException extend IOException instead of 
RuntimeException?

- Let's move AclsNotSupportedException and 
DistCpUtils.checkFileSystemAclSupport(..) to Common.  They are also useful for 
other cases.

> Enhance distcp to support preserving HDFS ACLs.
> ---
>
> Key: MAPREDUCE-5809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.4.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, 
> MAPREDUCE-5809.3.patch
>
>
> This issue tracks enhancing distcp to add a new command-line argument for 
> preserving HDFS ACLs from the source at the copy destination.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-04-29 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984168#comment-13984168
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-5402:


- In createSplits(..), should we get min records per chunk from conf?
- Similarly, in the new getSplitRatio(..) method, should we get split ratio 
from conf?
- Let's validate the conf values in getMaxChunksTolerable, getMaxChunksIdeal 
and getMinRecordsPerChunk.

> DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
> --
>
> Key: MAPREDUCE-5402
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp, mrv2
>Reporter: David Rosenstrauch
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
> MAPREDUCE-5402.3.patch
>
>
> In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
> describes the implementation of DynamicInputFormat, with one of the main 
> motivations cited being to reduce the chance of long-tails where a few 
> leftover mappers run much longer than the rest.
> However, I today ran into a situation where I experienced exactly such a long 
> tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
> the problem by overriding the number of mappers and the split ratio used by 
> the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
> set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
> This constant is actually set quite low for production use.  (See a 
> description of my use case below.)  And although MAPREDUCE-2765 states that 
> this is an "overridable maximum", when reading through the code there does 
> not actually appear to be any mechanism available to override it.
> This should be changed.  It should be possible to expand the maximum # of 
> chunks beyond this arbitrary limit.
> For example, here is the situation I ran into today:
> I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
> The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
> the number of mappers for the job from the default of 20 to 128, so as to 
> more properly parallelize the copy across the cluster.  The number of chunk 
> files created was calculated as 241, and mapred.num.entries.per.chunk was 
> calculated as 12.
> As the job ran on, it reached a point where there were only 4 remaining map 
> tasks, which had each been running for over 2 hours.  The reason for this was 
> that each of the 12 files that those mappers were copying were quite large 
> (several hundred megabytes in size) and took ~20 minutes each.  However, 
> during this time, all the other 124 mappers sat idle.
> In theory I should be able to alleviate this problem with DynamicInputFormat. 
>  If I were able to, say, quadruple the number of chunk files created, that 
> would have made each chunk contain only 3 files, and these large files would 
> have gotten distributed better around the cluster and copied in parallel.
> However, when I tried to do that - by overriding mapred.listing.split.ratio 
> to, say, 10 - DynamicInputFormat responded with an exception ("Too many 
> chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
> split-ratio to proceed.") - presumably because I exceeded the 
> MAX_CHUNKS_TOLERABLE value of 400.
> Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
> can't personally see any.
> If this limit has no particular logic behind it, then it should be 
> overridable - or even better:  removed altogether.  After all, I'm not sure I 
> see any need for it.  Even if numMaps * splitRatio resulted in an 
> extraordinarily large number, if the code were modified so that the number of 
> chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
> there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
> where the product of numMaps and splitRatio is large, capping the number of 
> chunks at the number of files (numberOfChunks = numberOfFiles) would result 
> in 1 file per chunk - the maximum parallelization possible.  That may not be 
> the best-tuned solution for some users, but I would think that it should be 
> left up to the user to deal with the potential consequence of not having 
> tuned their job properly.  Certainly that would be better than having an 
> arbitrary hard-coded limit that *prevents* proper parallelization when 
> dealing with large files and/or large numbers of mappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-04-28 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983862#comment-13983862
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-5402:


Sure, I should be able to review this later this week.

> DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
> --
>
> Key: MAPREDUCE-5402
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp, mrv2
>Reporter: David Rosenstrauch
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
> MAPREDUCE-5402.3.patch
>
>
> In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
> describes the implementation of DynamicInputFormat, with one of the main 
> motivations cited being to reduce the chance of long-tails where a few 
> leftover mappers run much longer than the rest.
> However, I today ran into a situation where I experienced exactly such a long 
> tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
> the problem by overriding the number of mappers and the split ratio used by 
> the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
> set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
> This constant is actually set quite low for production use.  (See a 
> description of my use case below.)  And although MAPREDUCE-2765 states that 
> this is an "overridable maximum", when reading through the code there does 
> not actually appear to be any mechanism available to override it.
> This should be changed.  It should be possible to expand the maximum # of 
> chunks beyond this arbitrary limit.
> For example, here is the situation I ran into today:
> I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
> The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
> the number of mappers for the job from the default of 20 to 128, so as to 
> more properly parallelize the copy across the cluster.  The number of chunk 
> files created was calculated as 241, and mapred.num.entries.per.chunk was 
> calculated as 12.
> As the job ran on, it reached a point where there were only 4 remaining map 
> tasks, which had each been running for over 2 hours.  The reason for this was 
> that each of the 12 files that those mappers were copying were quite large 
> (several hundred megabytes in size) and took ~20 minutes each.  However, 
> during this time, all the other 124 mappers sat idle.
> In theory I should be able to alleviate this problem with DynamicInputFormat. 
>  If I were able to, say, quadruple the number of chunk files created, that 
> would have made each chunk contain only 3 files, and these large files would 
> have gotten distributed better around the cluster and copied in parallel.
> However, when I tried to do that - by overriding mapred.listing.split.ratio 
> to, say, 10 - DynamicInputFormat responded with an exception ("Too many 
> chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
> split-ratio to proceed.") - presumably because I exceeded the 
> MAX_CHUNKS_TOLERABLE value of 400.
> Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
> can't personally see any.
> If this limit has no particular logic behind it, then it should be 
> overridable - or even better:  removed altogether.  After all, I'm not sure I 
> see any need for it.  Even if numMaps * splitRatio resulted in an 
> extraordinarily large number, if the code were modified so that the number of 
> chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
> there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
> where the product of numMaps and splitRatio is large, capping the number of 
> chunks at the number of files (numberOfChunks = numberOfFiles) would result 
> in 1 file per chunk - the maximum parallelization possible.  That may not be 
> the best-tuned solution for some users, but I would think that it should be 
> left up to the user to deal with the potential consequence of not having 
> tuned their job properly.  Certainly that would be better than having an 
> arbitrary hard-coded limit that *prevents* proper parallelization when 
> dealing with large files and/or large numbers of mappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5830) HostUtil.getTaskLogUrl is not backwards binary compatible with 2.3

2014-04-12 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967571#comment-13967571
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-5830:


Should we fix it in Hive?

> HostUtil.getTaskLogUrl is not backwards binary compatible with 2.3
> --
>
> Key: MAPREDUCE-5830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5830
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Priority: Blocker
>
> HostUtil.getTaskLogUrl used to have a signature like this in Hadoop 2.3.0 and 
> earlier:
> public static String getTaskLogUrl(String taskTrackerHostName, String 
> httpPort, String taskAttemptID)
> but now has a signature like this:
> public static String getTaskLogUrl(String scheme, String taskTrackerHostName, 
> String httpPort, String taskAttemptID)
> This breaks source and binary backwards-compatibility.  MapReduce and Hive 
> both have references to this, so their jars compiled against 2.3 or earlier 
> do not work on 2.4.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (MAPREDUCE-4976) Use the new StringUtils methods added by HADOOP-9252

2014-03-25 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved MAPREDUCE-4976.


Resolution: Not A Problem

This is actually not a problem.

> Use the new StringUtils methods added by HADOOP-9252
> 
>
> Key: MAPREDUCE-4976
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4976
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>
> HADOOP-9252 slightly changed the format of some StringUtils outputs.  Some 
> methods were deprecated by HADOOP-9252.  The use of them should be replaced 
> with the new methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5794) SliveMapper always uses default FileSystem.

2014-03-11 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-5794:
---

Attachment: m5794_20140311.patch

m5794_20140311.patch: gets path from base dir and removes 
@SuppressWarnings("deprecation").

> SliveMapper always uses default FileSystem.
> ---
>
> Key: MAPREDUCE-5794
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5794
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: m5794_20140311.patch
>
>
> Similar to MAPREDUCE-5780, SliveMapper should use the test path to get 
> FileSystem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5794) SliveMapper always uses default FileSystem.

2014-03-11 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-5794:
---

Status: Patch Available  (was: Open)

> SliveMapper always uses default FileSystem.
> ---
>
> Key: MAPREDUCE-5794
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5794
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: m5794_20140311.patch
>
>
> Similar to MAPREDUCE-5780, SliveMapper should use the test path to get 
> FileSystem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5794) SliveMapper always uses default FileSystem.

2014-03-11 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created MAPREDUCE-5794:
--

 Summary: SliveMapper always uses default FileSystem.
 Key: MAPREDUCE-5794
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5794
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


Similar to MAPREDUCE-5780, SliveMapper should use the test path to get 
FileSystem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5780) SliveTest always uses default FileSystem

2014-03-06 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-5780:
---

   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Arpit for reviewing the patch.

I have committed this.

> SliveTest always uses default FileSystem
> 
>
> Key: MAPREDUCE-5780
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5780
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: m5780_20140305.patch
>
>
> It should use the specified path to get FileSystem.  Otherwise, it won't work 
> if the FileSystem is different.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5780) SliveTest always uses default FileSystem

2014-03-05 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-5780:
---

Attachment: m5780_20140305.patch

m5780_20140305.patch: uses Path.getFileSystem(..).

> SliveTest always uses default FileSystem
> 
>
> Key: MAPREDUCE-5780
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5780
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: m5780_20140305.patch
>
>
> It should use the specified path to get FileSystem.  Otherwise, it won't work 
> if the FileSystem is different.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5780) SliveTest always uses default FileSystem

2014-03-05 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated MAPREDUCE-5780:
---

Status: Patch Available  (was: Open)

> SliveTest always uses default FileSystem
> 
>
> Key: MAPREDUCE-5780
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5780
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: m5780_20140305.patch
>
>
> It should use the specified path to get FileSystem.  Otherwise, it won't work 
> if the FileSystem is different.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5780) SliveTest always uses default FileSystem

2014-03-05 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created MAPREDUCE-5780:
-

 Summary: SliveTest always uses default FileSystem
 Key: MAPREDUCE-5780
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5780
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor


It should use the specified path to get FileSystem.  Otherwise, it won't work 
if the FileSystem is different.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5715) ProcfsBasedProcessTree#constructProcessInfo() can still throw NumberFormatException

2014-01-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867535#comment-13867535
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5715:
---

It looks like that the exception was from parsing the utime below.
{code}
// Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
pinfo.updateProcessInfo(m.group(2), m.group(3),
Integer.parseInt(m.group(4)), Integer.parseInt(m.group(5)),
Long.parseLong(m.group(7)), new BigInteger(m.group(8)),
Long.parseLong(m.group(10)), Long.parseLong(m.group(11)));
{code}

> ProcfsBasedProcessTree#constructProcessInfo() can still throw 
> NumberFormatException
> ---
>
> Key: MAPREDUCE-5715
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5715
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk, 2.2.0
> Environment: Ubuntu 13.04 (OS Kernel 3.9.0), Armv71Exynos5440
>Reporter: German Florez-Larrahondo
>Priority: Minor
> Attachments: constructprocessfailing.jpg
>
>
> For long running jobs I have hit an issue that seems to be to be similar to 
> the bug reported in https://issues.apache.org/jira/browse/MAPREDUCE-3583
> Unfortunately I do not have the OS logs for this issue, but the utime for the 
> application was read by Hadoop as "184467440737095551615" which does not fit 
> into a Long. In MAPREDUCE-3583 a change was made to 
> ProcfsBasedProcessTree.java 
>  in order to support larger values for stime. Perhaps we need to support 
> larger values for utime (although this could increase the complexity of the 
> math that is being performed on those numbers)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-11-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5604:
--

 Component/s: (was: client)
Hadoop Flags: Reviewed

+1 patch looks good.

Since this only changes a test, the build failure is obviously unrelated.

> TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max 
> path length
> ---
>
> Key: MAPREDUCE-5604
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: MAPREDUCE-5604.1.patch
>
>
> The test uses the full class name as a component of the 
> {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
> causes container launch to fail when trying to access files at a path longer 
> than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5131) Provide better handling of job status related apis during JT restart

2013-04-05 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623800#comment-13623800
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5131:
---

+1 the new patch looks good.

> Provide better handling of job status related apis during JT restart
> 
>
> Key: MAPREDUCE-5131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Attachments: MAPREDUCE-5131.patch, MAPREDUCE-5131.patch
>
>
> I've seen pig/hive applications bork during JT restart since they get NPEs - 
> this is due to fact that jobs are not really inited, but are submitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5131) Provide better handling of job status related apis during JT restart

2013-04-05 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623783#comment-13623783
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5131:
---

The defaultPolicy is to retry on JobTrackerNotYetInitializedException.  For 
SafeModeException, defaultPolicy is equivalent to TRY_ONCE_THEN_FAIL since 
SafeModeException will be thrown as a RemoteException.  So 
remoteExceptionToPolicyMap.put(SafeModeException.class, defaultPolicy) is the 
same as remoteExceptionToPolicyMap.put(SafeModeException.class, 
TRY_ONCE_THEN_FAIL).

I think it is simpler to change RetryUtils.getDefaultRetryPolicy to support 
multiple exceptions.



> Provide better handling of job status related apis during JT restart
> 
>
> Key: MAPREDUCE-5131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Attachments: MAPREDUCE-5131.patch
>
>
> I've seen pig/hive applications bork during JT restart since they get NPEs - 
> this is due to fact that jobs are not really inited, but are submitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved MAPREDUCE-5081.
---

   Resolution: Fixed
Fix Version/s: 1.2.0
 Hadoop Flags: Reviewed

Thanks Suresh for reviewing the patch.

I have committed this.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 1.2.0
>
> Attachments: DistCp.java.diff, m5081_20130328b.patch, 
> m5081_20130328.patch, m5981_20130321b.patch, m5981_20130321.patch, 
> m5981_20130323.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
> - MAPREDUCE-5014: custom CopyListing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081:
--

Attachment: m5081_20130328b.patch

Oops, I accidentally added a new vaidya entry to site.xml.
{code}
 
+
+
 
{code}

m5081_20130328b.patch: removes the new vaidya entry.



> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: DistCp.java.diff, m5081_20130328b.patch, 
> m5081_20130328.patch, m5981_20130321b.patch, m5981_20130321.patch, 
> m5981_20130323.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
> - MAPREDUCE-5014: custom CopyListing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081:
--

Attachment: m5081_20130328.patch

m5081_20130328.patch:
- keeps distcp (version 1) unchanged and adds a new distcp2 command;
- also updates the doc.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: DistCp.java.diff, m5081_20130328.patch, 
> m5981_20130321b.patch, m5981_20130321.patch, m5981_20130323.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
> - MAPREDUCE-5014: custom CopyListing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615978#comment-13615978
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081:
---

{quote}
> 1. sslConfig.xml, distcp-default.xml is missing Apache license header.

Will do.
{quote}
In branch-1, other xml files like core-default.xml and hdfs-default.xml also do 
not have license header.  So let's don't fix the new xml here and fix them with 
other xml files together later.


> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: DistCp.java.diff, m5981_20130321b.patch, 
> m5981_20130321.patch, m5981_20130323.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
> - MAPREDUCE-5014: custom CopyListing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081:
--

Attachment: DistCp.java.diff

DistCp.java.diff: diff between branch-1 tools/distcp2/DistCp.java (not 
tools/DistCp.java) and trunk tools/DistCp.java

diff b-1/src/tools/org/apache/hadoop/tools/distcp2/DistCp.java 
t3/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java 
 > DistCp.java.diff


> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: DistCp.java.diff, m5981_20130321b.patch, 
> m5981_20130321.patch, m5981_20130323.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
> - MAPREDUCE-5014: custom CopyListing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615961#comment-13615961
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081:
---

> 1. sslConfig.xml, distcp-default.xml is missing Apache license header.

Will do.

> 2. There is a lot difference in DistCp.java in trunk and DistCp.java in this 
> patch?

Are you sure that you diff the correct file since there two DistCp.java files?  
Let me post the diff.

> 3. ... Why not leave the old distcp as is and add a new command for distcp2?

Sure, I can add a new command for distcp2.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m5981_20130321b.patch, m5981_20130321.patch, 
> m5981_20130323.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
> - MAPREDUCE-5014: custom CopyListing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible

2013-03-23 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611705#comment-13611705
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5014:
---

I have combine the branch-1 patch here in MAPREDUCE-5081.

> Extending DistCp through a custom CopyListing is not possible
> -
>
> Key: MAPREDUCE-5014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Fix For: 3.0.0
>
> Attachments: m5014_20130322_b-1.patch, m5014_20130322b_b-1.patch, 
> m5014_20130322b.patch, m5014_20130322.patch, MAPREDUCE-5014.patch, 
> MAPREDUCE-5014.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> * While it is possible to implement a custom CopyListing in DistCp, DistCp 
> driver class doesn't allow for using this custom CopyListing.
> * Allow SimpleCopyListing to provide an option to exclude files (For instance 
> it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as 
> premature copy can indicate that the entire data is available at the 
> destination)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611553#comment-13611553
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081:
---

For m5981_20130323.patch:
{noformat}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 41 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 17 new Findbugs 
(version 1.3.9) warnings.
{noformat}

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m5981_20130321b.patch, m5981_20130321.patch, 
> m5981_20130323.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
> - MAPREDUCE-5014: custom CopyListing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081:
--

Attachment: m5981_20130323.patch

m5981_20130323.patch: includes MAPREDUCE-5014.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m5981_20130321b.patch, m5981_20130321.patch, 
> m5981_20130323.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
> - MAPREDUCE-5014: custom CopyListing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081:
--

Description: 
Here is a list of DistCpV2 JIRAs:
- MAPREDUCE-2765: DistCpV2 main jira
- HADOOP-8703: turn CRC checking off for 0 byte size 
- HDFS-3054: distcp -skipcrccheck has no effect.
- HADOOP-8431: Running distcp without args throws IllegalArgumentException
- HADOOP-8775: non-positive value to -bandwidth
- MAPREDUCE-4654: TestDistCp is ignored
- HADOOP-9022: distcp fails to copy file if -m 0 specified
- HADOOP-9025: TestCopyListing failing
- MAPREDUCE-5075: DistCp leaks input file handles
- distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
- MAPREDUCE-5014: custom CopyListing


  was:
Here is a list of DistCpV2 JIRAs:
- MAPREDUCE-2765: DistCpV2 main jira
- HADOOP-8703: turn CRC checking off for 0 byte size 
- HDFS-3054: distcp -skipcrccheck has no effect.
- HADOOP-8431: Running distcp without args throws IllegalArgumentException
- HADOOP-8775: non-positive value to -bandwidth
- MAPREDUCE-4654: TestDistCp is ignored
- HADOOP-9022: distcp fails to copy file if -m 0 specified
- HADOOP-9025: TestCopyListing failing
- MAPREDUCE-5075: DistCp leaks input file handles

- MAPREDUCE-5014: custom CopyListing (not yet committed to trunk)



MAPREDUCE-5014 was committed recently; revised description.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m5981_20130321b.patch, m5981_20130321.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - distcp part of HADOOP-8341: Fix findbugs issues in hadoop-tools
> - MAPREDUCE-5014: custom CopyListing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible

2013-03-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610328#comment-13610328
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5014:
---

Hi Amareshwari, thanks for checking in the patches!  I forgot to mention that 
the branch-1 patch depends on MAPREDUCE-5081, which backports distcp2 to 
branch-1.   Let's wait for it.

> Extending DistCp through a custom CopyListing is not possible
> -
>
> Key: MAPREDUCE-5014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Fix For: 3.0.0
>
> Attachments: m5014_20130322_b-1.patch, m5014_20130322b_b-1.patch, 
> m5014_20130322b.patch, m5014_20130322.patch, MAPREDUCE-5014.patch, 
> MAPREDUCE-5014.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> * While it is possible to implement a custom CopyListing in DistCp, DistCp 
> driver class doesn't allow for using this custom CopyListing.
> * Allow SimpleCopyListing to provide an option to exclude files (For instance 
> it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as 
> premature copy can indicate that the entire data is available at the 
> destination)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5014:
--

Attachment: m5014_20130322b.patch

m5014_20130322b.patch: add timeout for some existing tests.
m5014_20130322b_b-1.patch: for branch-1.

> Extending DistCp through a custom CopyListing is not possible
> -
>
> Key: MAPREDUCE-5014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: m5014_20130322_b-1.patch, m5014_20130322b_b-1.patch, 
> m5014_20130322b.patch, m5014_20130322.patch, MAPREDUCE-5014.patch, 
> MAPREDUCE-5014.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> * While it is possible to implement a custom CopyListing in DistCp, DistCp 
> driver class doesn't allow for using this custom CopyListing.
> * Allow SimpleCopyListing to provide an option to exclude files (For instance 
> it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as 
> premature copy can indicate that the entire data is available at the 
> destination)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5014:
--

Attachment: m5014_20130322b_b-1.patch

> Extending DistCp through a custom CopyListing is not possible
> -
>
> Key: MAPREDUCE-5014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: m5014_20130322_b-1.patch, m5014_20130322b_b-1.patch, 
> m5014_20130322.patch, MAPREDUCE-5014.patch, MAPREDUCE-5014.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> * While it is possible to implement a custom CopyListing in DistCp, DistCp 
> driver class doesn't allow for using this custom CopyListing.
> * Allow SimpleCopyListing to provide an option to exclude files (For instance 
> it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as 
> premature copy can indicate that the entire data is available at the 
> destination)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609900#comment-13609900
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5014:
---

> -1 one of tests included doesn't have a timeout.

The test without timeout is an existing test.  The test-patch.sh should not -1 
on this patch.  I think it is a bug in HADOOP-9387.

> Extending DistCp through a custom CopyListing is not possible
> -
>
> Key: MAPREDUCE-5014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: m5014_20130322_b-1.patch, m5014_20130322.patch, 
> MAPREDUCE-5014.patch, MAPREDUCE-5014.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> * While it is possible to implement a custom CopyListing in DistCp, DistCp 
> driver class doesn't allow for using this custom CopyListing.
> * Allow SimpleCopyListing to provide an option to exclude files (For instance 
> it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as 
> premature copy can indicate that the entire data is available at the 
> destination)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5014:
--

Attachment: m5014_20130322_b-1.patch

m5014_20130322_b-1.patch: for branch-1.

> Extending DistCp through a custom CopyListing is not possible
> -
>
> Key: MAPREDUCE-5014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: m5014_20130322_b-1.patch, m5014_20130322.patch, 
> MAPREDUCE-5014.patch, MAPREDUCE-5014.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> * While it is possible to implement a custom CopyListing in DistCp, DistCp 
> driver class doesn't allow for using this custom CopyListing.
> * Allow SimpleCopyListing to provide an option to exclude files (For instance 
> it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as 
> premature copy can indicate that the entire data is available at the 
> destination)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5014:
--

Attachment: m5014_20130322.patch

Since it is trivial to add timeouts, let me post a patch for it.

m5014_20130322.patch: adds timeout and fixes some formatting issues.

> Extending DistCp through a custom CopyListing is not possible
> -
>
> Key: MAPREDUCE-5014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: m5014_20130322.patch, MAPREDUCE-5014.patch, 
> MAPREDUCE-5014.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> * While it is possible to implement a custom CopyListing in DistCp, DistCp 
> driver class doesn't allow for using this custom CopyListing.
> * Allow SimpleCopyListing to provide an option to exclude files (For instance 
> it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as 
> premature copy can indicate that the entire data is available at the 
> destination)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5014) Extending DistCp through a custom CopyListing is not possible

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609835#comment-13609835
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5014:
---

Patch looks good.  Could you add timeout for the tests in TestIntegration?

> Extending DistCp through a custom CopyListing is not possible
> -
>
> Key: MAPREDUCE-5014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5014
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.23.0, 0.23.1, 0.23.3, trunk, 0.23.4, 0.23.5
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: MAPREDUCE-5014.patch, MAPREDUCE-5014.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> * While it is possible to implement a custom CopyListing in DistCp, DistCp 
> driver class doesn't allow for using this custom CopyListing.
> * Allow SimpleCopyListing to provide an option to exclude files (For instance 
> it is useful to exclude FileOutputCommiter.SUCCEEDED_FILE_NAME during copy as 
> premature copy can indicate that the entire data is available at the 
> destination)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609238#comment-13609238
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081:
---

All tests passed with the patch in my machine.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m5981_20130321b.patch, m5981_20130321.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608846#comment-13608846
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081:
---

{noformat}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 41 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 17 new Findbugs 
(version 1.3.9) warnings.
{noformat}
The remaining findbugs warnings are not related to this.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m5981_20130321b.patch, m5981_20130321.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081:
--

Attachment: m5981_20130321b.patch

Two of the findbugs warnings are related.

m5981_20130321b.patch: backports distcp part of HADOOP-8341.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m5981_20130321b.patch, m5981_20130321.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608805#comment-13608805
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-5081:
---

{noformat}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 41 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 19 new Findbugs 
(version 1.3.9) warnings.
{noformat}


> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m5981_20130321.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-21 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081:
--

Attachment: m5981_20130321.patch

m5981_20130321.patch: backports all JIRAs except MAPREDUCE-5014.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m5981_20130321.patch
>
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5081:
--

Description: 
Here is a list of DistCpV2 JIRAs:
- MAPREDUCE-2765: DistCpV2 main jira
- HADOOP-8703: turn CRC checking off for 0 byte size 
- HDFS-3054: distcp -skipcrccheck has no effect.
- HADOOP-8431: Running distcp without args throws IllegalArgumentException
- HADOOP-8775: non-positive value to -bandwidth
- MAPREDUCE-4654: TestDistCp is ignored
- HADOOP-9022: distcp fails to copy file if -m 0 specified
- HADOOP-9025: TestCopyListing failing
- MAPREDUCE-5075: DistCp leaks input file handles

- MAPREDUCE-5014: custom CopyListing (not yet committed to trunk)


  was:
Here is a list of DistCpV2 JIRAs:
- MAPREDUCE-2765: DistCpV2 main jira
- HADOOP-8703: turn CRC checking off for 0 byte size 
- HDFS-3054: distcp -skipcrccheck has no effect.
- HADOOP-8431: Running distcp without args throws IllegalArgumentException
- HADOOP-8775: non-positive value to -bandwidth
- MAPREDUCE-4654: TestDistCp is ignored
- HADOOP-9022. distcp fails to copy file if -m 0 specified
- HADOOP-9025. TestCopyListing failing

- MAPREDUCE-5014: custom CopyListing (not yet committed to trunk)



Sure, MAPREDUCE-5075 is a useful bug fix.

> Backport DistCpV2 and the related JIRAs to branch-1
> ---
>
> Key: MAPREDUCE-5081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> Here is a list of DistCpV2 JIRAs:
> - MAPREDUCE-2765: DistCpV2 main jira
> - HADOOP-8703: turn CRC checking off for 0 byte size 
> - HDFS-3054: distcp -skipcrccheck has no effect.
> - HADOOP-8431: Running distcp without args throws IllegalArgumentException
> - HADOOP-8775: non-positive value to -bandwidth
> - MAPREDUCE-4654: TestDistCp is ignored
> - HADOOP-9022: distcp fails to copy file if -m 0 specified
> - HADOOP-9025: TestCopyListing failing
> - MAPREDUCE-5075: DistCp leaks input file handles
> - MAPREDUCE-5014: custom CopyListing (not yet committed to trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5075) DistCp leaks input file handles

2013-03-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5075:
--

   Resolution: Fixed
Fix Version/s: 2.0.5-beta
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Chris!

> DistCp leaks input file handles
> ---
>
> Key: MAPREDUCE-5075
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5075
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.0.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: 2.0.5-beta
>
> Attachments: MAPREDUCE-5075.1.patch
>
>
> DistCp wraps the {{InputStream}} for each input file it reads in an instance 
> of {{ThrottledInputStream}}.  This class does not close the wrapped 
> {{InputStream}}.  {{RetriableFileCopyCommand}} guarantees that the 
> {{ThrottledInputStream}} gets closed, but without closing the underlying 
> wrapped stream, it still leaks a file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5075) DistCp leaks input file handles

2013-03-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-5075:
--

Hadoop Flags: Reviewed

+1 patch looks good.

> DistCp leaks input file handles
> ---
>
> Key: MAPREDUCE-5075
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5075
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.0.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: MAPREDUCE-5075.1.patch
>
>
> DistCp wraps the {{InputStream}} for each input file it reads in an instance 
> of {{ThrottledInputStream}}.  This class does not close the wrapped 
> {{InputStream}}.  {{RetriableFileCopyCommand}} guarantees that the 
> {{ThrottledInputStream}} gets closed, but without closing the underlying 
> wrapped stream, it still leaks a file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5081) Backport DistCpV2 and the related JIRAs to branch-1

2013-03-19 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created MAPREDUCE-5081:
-

 Summary: Backport DistCpV2 and the related JIRAs to branch-1
 Key: MAPREDUCE-5081
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5081
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: distcp
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


Here is a list of DistCpV2 JIRAs:
- MAPREDUCE-2765: DistCpV2 main jira
- HADOOP-8703: turn CRC checking off for 0 byte size 
- HDFS-3054: distcp -skipcrccheck has no effect.
- HADOOP-8431: Running distcp without args throws IllegalArgumentException
- HADOOP-8775: non-positive value to -bandwidth
- MAPREDUCE-4654: TestDistCp is ignored
- HADOOP-9022. distcp fails to copy file if -m 0 specified
- HADOOP-9025. TestCopyListing failing

- MAPREDUCE-5014: custom CopyListing (not yet committed to trunk)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack

2013-02-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584591#comment-13584591
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4502:
---

In the 
[console|https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3351//console],
 it said "Build step 'Execute shell' marked build as failure".


> Multi-level aggregation with combining the result of maps per node/rack
> ---
>
> Key: MAPREDUCE-4502
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: design_v2.pdf, MAPREDUCE-4502.1.patch, 
> MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, 
> MAPREDUCE-4525-pof.diff, speculative_draft.pdf
>
>
> The shuffle costs is expensive in Hadoop in spite of the existence of 
> combiner, because the scope of combining is limited within only one MapTask. 
> To solve this problem, it's a good way to aggregate the result of maps per 
> node/rack by launch combiner.
> This JIRA is to implement the multi-level aggregation infrastructure, 
> including combining per container(MAPREDUCE-3902 is related), coordinating 
> containers by application master without breaking fault tolerance of jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4976) Use the new StringUtils methods added by HADOOP-9252

2013-02-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-4976:
--

Description: HADOOP-9252 slightly changed the format of some StringUtils 
outputs.  Some methods were deprecated by HADOOP-9252.  The use of them should 
be replaced with the new methods.  (was: HADOOP-9252 slightly changes the 
format of some StringUtils outputs.  It may cause test failures.

Also, some methods were deprecated by HADOOP-9252.  The use of them should be 
replaced with the new methods.)
 Issue Type: Improvement  (was: Bug)
Summary: Use the new StringUtils methods added by HADOOP-9252  (was: 
Fix test failure for HADOOP-9252)

A recent Jerkins build [build 
#3309|https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3309/] shows that 
HADOOP-9252 does not cause test failure in MapReduce.

(Revised summary and description.)

> Use the new StringUtils methods added by HADOOP-9252
> 
>
> Key: MAPREDUCE-4976
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4976
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
>
> HADOOP-9252 slightly changed the format of some StringUtils outputs.  Some 
> methods were deprecated by HADOOP-9252.  The use of them should be replaced 
> with the new methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4976) Fix test failure for HADOOP-9252

2013-02-04 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-4976:
--

Description: 
HADOOP-9252 slightly changes the format of some StringUtils outputs.  It may 
cause test failures.

Also, some methods were deprecated by HADOOP-9252.  The use of them should be 
replaced with the new methods.

  was:
HADOOP-9252 slightly changes the format of some StringUtils outputs.  It may 
cause test failures.

Also, some methods was deprecated by HADOOP-9252.  The use of them should be 
replaced with the new methods.


> Fix test failure for HADOOP-9252
> 
>
> Key: MAPREDUCE-4976
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4976
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
>
> HADOOP-9252 slightly changes the format of some StringUtils outputs.  It may 
> cause test failures.
> Also, some methods were deprecated by HADOOP-9252.  The use of them should be 
> replaced with the new methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4976) Fix test failure for HADOOP-9252

2013-02-04 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created MAPREDUCE-4976:
-

 Summary: Fix test failure for HADOOP-9252
 Key: MAPREDUCE-4976
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4976
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor


HADOOP-9252 slightly changes the format of some StringUtils outputs.  It may 
cause test failures.

Also, some methods was deprecated by HADOOP-9252.  The use of them should be 
replaced with the new methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-28 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465601#comment-13465601
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4651:
---

Oops, I forgot to reload the page.  It is great that Jakob has reviewed it.

> Benchmarking random reads with DFSIO
> 
>
> Key: MAPREDUCE-4651
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: benchmarks, test
>Affects Versions: 1.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.23.4
>
> Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It 
> will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO

2012-09-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464794#comment-13464794
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4651:
---

Hi Konstantin, the patch no longer applies to trunk.  Could you update it?

> Benchmarking random reads with DFSIO
> 
>
> Key: MAPREDUCE-4651
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: benchmarks, test
>Affects Versions: 1.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.23.4
>
> Attachments: randomDFSIO.patch, randomDFSIO.patch, randomDFSIO.patch
>
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It 
> will be useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4603) Allow JobClient to retry job-submission when JT is in safemode

2012-09-25 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462854#comment-13462854
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4603:
---

+1 patch looks good.

> Allow JobClient to retry job-submission when JT is in safemode
> --
>
> Key: MAPREDUCE-4603
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4603
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Attachments: MAPREDUCE-4603.patch
>
>
> Similar to HDFS-3504, it would be useful to allow JobClient to retry 
> job-submission when JT is in safemode (via MAPREDUCE-4328).
> This way applications like Pig/Hive don't bork midway when the NN/JT are not 
> operational.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4309) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2012-07-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418031#comment-13418031
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4309:
---

Abstract Factory sounds good.

For the enum problem, let's use enum for the moment and change it to interfaces 
later.

(Sorry that I was not able to response earlier.)

> Make locatlity in YARN's container assignment and task scheduling pluggable 
> for other deployment topology
> -
>
> Key: MAPREDUCE-4309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4309
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: 
> HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
> MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, 
> MAPREDUCE-4309-v5.patch, MAPREDUCE-4309.patch
>
>
> There are several classes in YARN’s container assignment and task scheduling 
> algorithms that relate to data locality which were updated to give preference 
> to running a container on other locality besides node-local and rack-local 
> (like nodegroup-local). This propose to make these data structure/algorithms 
> pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
> ScheduledRequests was made a package level class to it would be easier to 
> create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4309) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2012-07-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409267#comment-13409267
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4309:
---

- Since JobCounter is public evolving, I think we cannot add 
NODEGROUP_LOCAL_MAPS.  Also, NODEGROUP_LOCAL_MAPS is not yet used.

- Similarly, NodeType.NODEGROUP_LOCAL adds node group notation to the original 
code.  Is there anyway to prevent it?  I think we may change the enum to an 
interface but it is a much bigger change.

- BTW, YarnConfiguration.NET_TOPOLOGY_WITH_NODEGROUP is not used 


> Make locatlity in YARN's container assignment and task scheduling pluggable 
> for other deployment topology
> -
>
> Key: MAPREDUCE-4309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4309
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: 
> HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
> MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, 
> MAPREDUCE-4309.patch
>
>
> There are several classes in YARN’s container assignment and task scheduling 
> algorithms that relate to data locality which were updated to give preference 
> to running a container on other locality besides node-local and rack-local 
> (like nodegroup-local). This propose to make these data structure/algorithms 
> pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
> ScheduledRequests was made a package level class to it would be easier to 
> create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4309) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2012-07-05 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407705#comment-13407705
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4309:
---

Quick comments:
- It does not apply anymore.
{noformat}
$patch -p0 -i 
~/Downloads/HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch 
patching file 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
Hunk #1 FAILED at 18.
Hunk #2 succeeded at 37 (offset 1 line).
Hunk #3 FAILED at 45.
Hunk #4 succeeded at 65 with fuzz 2 (offset 9 lines).
Hunk #5 succeeded at 80 (offset 9 lines).
Hunk #6 succeeded at 119 (offset 9 lines).
Hunk #7 succeeded at 161 (offset 9 lines).
Hunk #8 succeeded at 548 (offset 10 lines).
Hunk #9 FAILED at 633.
3 out of 9 hunks FAILED -- saving rejects to file 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java.rej
...
{noformat}
- Could we keep ScheduledRequests as an inner class of RMContainerAllocator, 
i.e. change it to public/protected/package-private static?  It will be easier 
to see the changes.

> Make locatlity in YARN's container assignment and task scheduling pluggable 
> for other deployment topology
> -
>
> Key: MAPREDUCE-4309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4309
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: 
> HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch
>
>
> There are several classes in YARN’s container assignment and task scheduling 
> algorithms that relate to data locality which were updated to give preference 
> to running a container on other locality besides node-local and rack-local 
> (like nodegroup-local). This propose to make these data structure/algorithms 
> pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
> ScheduledRequests was made a package level class to it would be easier to 
> create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4310) 4-layer topology (with NodeGroup layer) implementation of Container Assignment and Task Scheduling (for YARN)

2012-07-04 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406814#comment-13406814
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4310:
---

I think we should add a subclass of RackResolver to support NodeGroup.  It is 
similar to what we did for NetworkTopology.

> 4-layer topology (with NodeGroup layer) implementation of Container 
> Assignment and Task Scheduling (for YARN)
> -
>
> Key: MAPREDUCE-4310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: 
> HADOOP-8475-ContainerAssignmentTaskScheduling-withNodeGroup.patch
>
>
> There are several classes in YARN’s container assignment and task scheduling 
> algorithms that related to data locality which were updated to give 
> preference to running a container on the same nodegroup. This section 
> summarized the changes in the patch that provides a new implementation to 
> support a four-layer hierarchy.
> When the ApplicationMaster makes a resource allocation request to the 
> scheduler of ResourceManager, it will add the node group to the list of 
> attributes in the ResourceRequest. The parameters of the resource request 
> will change from  to 
> .
> After receiving the ResoureRequest the RM scheduler will assign containers 
> for requests in the sequence of data-local, nodegroup-local, rack-local and 
> off-switch.Then, ApplicationMaster schedules tasks on allocated containers in 
> sequence of data- local, nodegroup-local, rack-local and off-switch.
> In terms of code changes made to YARN task scheduling, we updated the class 
> ContainerRequestEvent so that applications can requests for containers can 
> include anodegroup. In RM schedulers, FifoScheduler and CapacityScheduler 
> were updated. For the FifoScheduler, the changes were in the method 
> assignContainers. For the Capacity Scheduler the method 
> assignContainersOnNode in the class of LeafQueue was updated. In both changes 
> a new method, assignNodeGroupLocalContainers() was added in between the 
> assignment data-local and rack-local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4323) NM leaks sockets

2012-06-07 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291175#comment-13291175
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4323:
---

This looks like a problem of the newly added socket cache.  Once it is fixed 
(say, it is removed for the sake of discussion), are there other problems?

> NM leaks sockets
> 
>
> Key: MAPREDUCE-4323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4323
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 0.23.0, 0.24.0, 2.0.0-alpha
>Reporter: Daryn Sharp
>Priority: Critical
>
> The NM is exhausting its fds because it's not closing fs instances when the 
> app is finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR

2012-05-17 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278222#comment-13278222
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4266:
---

As you mentioned, RAID is WIP.  Could you communicate this with MAPREDUCE-3868?

> remove Ant remnants from MR
> ---
>
> Key: MAPREDUCE-4266
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: build
>Affects Versions: 2.0.1
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.0.1
>
> Attachments: MAPREDUCE-4266.patch
>
>
> Remove:
> hadoop-mapreduce-project/src/*
> hadoop-mapreduce-project/ivy/*
> hadoop-mapreduce-project/build.xml
> hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR

2012-05-17 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278144#comment-13278144
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4266:
---

Are these filed required for compiling RAID or other contrib projects?

> remove Ant remnants from MR
> ---
>
> Key: MAPREDUCE-4266
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: build
>Affects Versions: 2.0.1
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.0.1
>
> Attachments: MAPREDUCE-4266.patch
>
>
> Remove:
> hadoop-mapreduce-project/src/*
> hadoop-mapreduce-project/ivy/*
> hadoop-mapreduce-project/build.xml
> hadoop-mapreduce-project/ivy.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4231) Update RAID to not to use FSInodeInfo

2012-05-08 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-4231:
--

   Resolution: Fixed
Fix Version/s: 2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks John for reviewing and testing it!

I have committed this.

> Update RAID to not to use FSInodeInfo
> -
>
> Key: MAPREDUCE-4231
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4231
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 2.0.0
>
> Attachments: m4231_20120507.patch
>
>
> FSInodeInfo was removed by HDFS-3363.  We should update RAID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4231) Update RAID to not to use FSInodeInfo

2012-05-07 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-4231:
--

Attachment: m4231_20120507.patch

m4231_20120507.patch: use BlockCollection instead.

> Update RAID to not to use FSInodeInfo
> -
>
> Key: MAPREDUCE-4231
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4231
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Reporter: Tsz Wo (Nicholas), SZE
> Attachments: m4231_20120507.patch
>
>
> FSInodeInfo was removed by HDFS-3363.  We should update RAID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4231) Update RAID to not to use FSInodeInfo

2012-05-07 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-4231:
--

Assignee: Tsz Wo (Nicholas), SZE
  Status: Patch Available  (was: Open)

> Update RAID to not to use FSInodeInfo
> -
>
> Key: MAPREDUCE-4231
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4231
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m4231_20120507.patch
>
>
> FSInodeInfo was removed by HDFS-3363.  We should update RAID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-4231) Update RAID to not to use FSInodeInfo

2012-05-07 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE moved HDFS-3380 to MAPREDUCE-4231:
-

Component/s: (was: contrib/raid)
 contrib/raid
Key: MAPREDUCE-4231  (was: HDFS-3380)
Project: Hadoop Map/Reduce  (was: Hadoop HDFS)

> Update RAID to not to use FSInodeInfo
> -
>
> Key: MAPREDUCE-4231
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4231
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Reporter: Tsz Wo (Nicholas), SZE
> Attachments: m4231_20120507.patch
>
>
> FSInodeInfo was removed by HDFS-3363.  We should update RAID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4172) Clean up java warnings in the hadoop-mapreduce-project sub projects

2012-04-30 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265598#comment-13265598
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4172:
---

In general, I appreciate that you are fixing the warnings.  However, please 
don't suppress them you can't fix them.  Please also add more description on 
what are actually done in individual JIRAs instead of "Clean up Xxx". Thanks.

> Clean up java warnings in the hadoop-mapreduce-project sub projects
> ---
>
> Key: MAPREDUCE-4172
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4172
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: build
>Affects Versions: trunk
>Reporter: Harsh J
>Assignee: Harsh J
>
> There are lots of warnings in the hadoop-mapreduce-project presently. We can 
> clear almost all of this away:
> * Unused imports
> * Unused variables
> ** For loops that can be replaced with while instead to save an unused 
> variable
> * Unused methods
> * Deprecation warnings where an alternative can be used (Especially 
> SequenceFile reader/writer usage and MiniDFSCluster usage)
> * Deprecation warnings where an alternative isn't clear (Especially 
> MiniMRCluster usage and DistributedCache API usage where a Job object may not 
> be available)
> * Unchecked conversions
> * Raw type usage
> * (etc.)
> I'm going to open one sub-task per sub-project we have, with patches attached 
> to them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4183) Clean up yarn-common

2012-04-30 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265595#comment-13265595
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4183:
---

Please do not add @SuppressWarnings(..).  Remove them or fixing the warnings 
are great.

> Clean up yarn-common
> 
>
> Key: MAPREDUCE-4183
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4183
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: trunk
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: 0011-YARN-Common-Cleanup.patch, 
> 0011-YARN-Common-Cleanup.patch
>
>
> Clean up a bunch of existing javac warnings in hadoop-yarn-common module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4182) Clean up yarn-applications-distributedshell

2012-04-30 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265592#comment-13265592
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4182:
---

Typo: "used" should be "unused".

> Clean up yarn-applications-distributedshell
> ---
>
> Key: MAPREDUCE-4182
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4182
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: trunk
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: 
> 0010-YARN-Applications-DistributedShell-Example-Cleanup.patch, 
> 0010-YARN-Applications-DistributedShell-Example-Cleanup.patch
>
>
> Clean up a bunch of existing javac warnings in 
> hadoop-yarn-applications-distributedshell module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4181) Clean up yarn-api

2012-04-30 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265591#comment-13265591
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4181:
---

Could you revise the summary and description for reflecting the actual changes? 
 "Clean up yarn-api" sounds like changing the Yarn APIs.  How about "Remove the 
unused maybeInitBuilder() method from various classes in 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/"?

> Clean up yarn-api
> -
>
> Key: MAPREDUCE-4181
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4181
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: trunk
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: 0009-YARN-API-Cleanup.patch
>
>
> Clean up a bunch of existing javac warnings in hadoop-yarn-api module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4182) Clean up yarn-applications-distributedshell

2012-04-30 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265589#comment-13265589
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4182:
---

Since only an used import is removed, I think the summary and the description 
should be revised to something like "Remove an used import from 
TestDistributedShell".

> Clean up yarn-applications-distributedshell
> ---
>
> Key: MAPREDUCE-4182
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4182
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: trunk
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: 
> 0010-YARN-Applications-DistributedShell-Example-Cleanup.patch, 
> 0010-YARN-Applications-DistributedShell-Example-Cleanup.patch
>
>
> Clean up a bunch of existing javac warnings in 
> hadoop-yarn-applications-distributedshell module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4174) Clean up hadoop-mapreduce-client-common

2012-04-30 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265587#comment-13265587
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4174:
---

{code}
+@SuppressWarnings("deprecation")
 @Private
 @Unstable
 public class MRApps extends Apps {
{code}
Please do not add @SuppressWarnings in the class header.  It basically turns 
off the warning feature for the entire class.  We should fix the warnings but 
not suppress them.


> Clean up hadoop-mapreduce-client-common
> ---
>
> Key: MAPREDUCE-4174
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4174
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: trunk
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: 0002-MR-Client-Common-Cleanup.patch
>
>
> Clean up a bunch of existing javac warnings in hadoop-mapreduce-client-common 
> module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4176) Clean up hadoop-mapreduce-client-hs

2012-04-30 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265588#comment-13265588
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4176:
---

Please do not add @SuppressWarnings("rawtypes").

> Clean up hadoop-mapreduce-client-hs
> ---
>
> Key: MAPREDUCE-4176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4176
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: trunk
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: 0004-MapReduce-Client-HistoryServer-Cleanup.patch
>
>
> Clean up a bunch of existing javac warnings in hadoop-mapreduce-client-hs 
> module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4172) Clean up java warnings in the hadoop-mapreduce-project sub projects

2012-04-30 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265583#comment-13265583
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-4172:
---

Hi Harsh, please do not add @SuppressWarnings(..), especially 
@SuppressWarnings("rawtypes").  We should fix the warnings instead.

> Clean up java warnings in the hadoop-mapreduce-project sub projects
> ---
>
> Key: MAPREDUCE-4172
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4172
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: build
>Affects Versions: trunk
>Reporter: Harsh J
>Assignee: Harsh J
>
> There are lots of warnings in the hadoop-mapreduce-project presently. We can 
> clear almost all of this away:
> * Unused imports
> * Unused variables
> ** For loops that can be replaced with while instead to save an unused 
> variable
> * Unused methods
> * Deprecation warnings where an alternative can be used (Especially 
> SequenceFile reader/writer usage and MiniDFSCluster usage)
> * Deprecation warnings where an alternative isn't clear (Especially 
> MiniMRCluster usage and DistributedCache API usage where a Job object may not 
> be available)
> * Unchecked conversions
> * Raw type usage
> * (etc.)
> I'm going to open one sub-task per sub-project we have, with patches attached 
> to them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4074) Client continuously retries to RM When RM goes down before launching Application Master

2012-04-26 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-4074:
--

Assignee: xieguiming

> Client continuously retries to RM When RM goes down before launching 
> Application Master
> ---
>
> Key: MAPREDUCE-4074
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4074
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.23.1
>Reporter: Devaraj K
>Assignee: xieguiming
> Fix For: 0.23.3
>
> Attachments: MAPREDUCE-4074-1.patch, MAPREDUCE-4074-2.patch, 
> MAPREDUCE-4074-3.patch, MAPREDUCE-4074.patch
>
>
> Client continuously tries to RM and logs the below messages when the RM goes 
> down before launching App Master. 
> I feel exception should be thrown or break the loop after finite no of 
> retries.
> {code:xml}
> 28/03/12 07:15:03 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 0 time(s).
> 28/03/12 07:15:04 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 1 time(s).
> 28/03/12 07:15:05 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 2 time(s).
> 28/03/12 07:15:06 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 3 time(s).
> 28/03/12 07:15:07 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 4 time(s).
> 28/03/12 07:15:08 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 5 time(s).
> 28/03/12 07:15:09 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 6 time(s).
> 28/03/12 07:15:10 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 7 time(s).
> 28/03/12 07:15:11 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 8 time(s).
> 28/03/12 07:15:12 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 9 time(s).
> 28/03/12 07:15:13 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 0 time(s).
> 28/03/12 07:15:14 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 1 time(s).
> 28/03/12 07:15:15 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 2 time(s).
> 28/03/12 07:15:16 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 3 time(s).
> 28/03/12 07:15:17 INFO ipc.Client: Retrying connect to server: 
> linux-f330.site/10.18.40.182:8032. Already tried 4 time(s).
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4066) To get "yarn.app.mapreduce.am.staging-dir" value, should set the default value

2012-04-26 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-4066:
--

Assignee: xieguiming

> To get "yarn.app.mapreduce.am.staging-dir" value, should set the default value
> --
>
> Key: MAPREDUCE-4066
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4066
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission, mrv2
>Affects Versions: 0.23.1
> Environment: client is windows eclipse, server is suse
>Reporter: xieguiming
>Assignee: xieguiming
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: MAPREDUCE-4066.patch, MAPREDUCE-4066.patch
>
>
> when submit the job use the windows eclipse, and the 
> yarn.app.mapreduce.am.staging-dir value is null.
> {code:title=MRApps.java|borderStyle=solid}
>   public static Path getStagingAreaDir(Configuration conf, String user) {
> return new Path(
> conf.get(MRJobConfig.MR_AM_STAGING_DIR) + 
> Path.SEPARATOR + user + Path.SEPARATOR + STAGING_CONSTANT);
>   }
> {code}
> should modify to:
> {code:title=MRApps.java|borderStyle=solid}
>   public static Path getStagingAreaDir(Configuration conf, String user) {
> return new Path(
> conf.get(MRJobConfig.MR_AM_STAGING_DIR,"/tmp/hadoop-yarn/staging") + 
> Path.SEPARATOR + user + Path.SEPARATOR + STAGING_CONSTANT);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-4192) the TaskMemoryManager thread is not interrupt when the TaskTracker is oedered to reinit by JobTracker

2012-04-23 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE moved HADOOP-8300 to MAPREDUCE-4192:
---

Affects Version/s: (was: 0.20.2)
   0.20.2
  Key: MAPREDUCE-4192  (was: HADOOP-8300)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> the TaskMemoryManager thread is not interrupt when the TaskTracker is oedered 
> to reinit by JobTracker
> -
>
> Key: MAPREDUCE-4192
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4192
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2
>Reporter: Hua xu
>
> When the TaskTracker is oedered to reinit by JobTracker, it will interrupt 
> some threads and then reinit them, but TaskTracker does not interrupt  
> TaskMemoryManager thread and create a new TaskMemoryManager thread again. I 
> use the tool--jstack to find that(I reinit TaskTracker 3 times through 
> JobTracker send TaskTrackerAction.ActionType.REINIT_TRACKER).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3076) TestSleepJob fails

2011-09-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-3076:
--

   Resolution: Fixed
Fix Version/s: 0.20.206.0
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Arun!

> TestSleepJob fails 
> ---
>
> Key: MAPREDUCE-3076
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3076
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.205.0
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>Priority: Blocker
> Fix For: 0.20.205.0, 0.20.206.0
>
> Attachments: MAPREDUCE-3076.patch
>
>
> TestSleepJob fails, it was intended to be used in other tests for 
> MAPREDUCE-2981.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3076) TestSleepJob fails

2011-09-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-3076:
--

Hadoop Flags: [Reviewed]

+1 patch looks good.

After the patch, TestSleepJob is ignored.
{noformat}
[junit] Running org.apache.hadoop.mapreduce.TestSleepJob
[junit] Tests run: 0, Failures: 0, Errors: 0, Time elapsed: 0.005 sec
{noformat}

> TestSleepJob fails 
> ---
>
> Key: MAPREDUCE-3076
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3076
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.205.0
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>Priority: Blocker
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-3076.patch
>
>
> TestSleepJob fails, it was intended to be used in other tests for 
> MAPREDUCE-2981.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2981) Backport trunk fairscheduler to 0.20-security branch

2011-09-22 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113014#comment-13113014
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-2981:
---

TestSleepJob failed; please see [build 
#16|https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-0.20.205-Build/16/testReport/org.apache.hadoop.mapreduce/TestSleepJob/initializationError/].
  It is not a unit test so that it cannot be initialized by the junit 
framework.  Could you take a look?

> Backport trunk fairscheduler to 0.20-security branch
> 
>
> Key: MAPREDUCE-2981
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2981
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/fair-share
>Affects Versions: 0.20.205.0
>Reporter: Matei Zaharia
>Assignee: Matei Zaharia
> Fix For: 0.20.205.0
>
> Attachments: fairsched-backport-v2.patch, 
> fairsched-backport-v3.patch, fairsched-backport.patch
>
>
> A lot of improvements have been made to the fair scheduler in 0.21, 0.22 and 
> trunk, but have not been ported back to the new 0.20.20X releases that are 
> currently considered the stable branch of Hadoop.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled

2011-09-08 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-2711:
--

   Resolution: Fixed
Fix Version/s: 0.24.0
   0.23.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Thanks Arun for the review.

I have committed this.

> TestBlockPlacementPolicyRaid cannot be compiled
> ---
>
> Key: MAPREDUCE-2711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0, 0.24.0
>
> Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, 
> m2711_20110727.patch, m2711_20110818.patch, m2711_20110908.patch
>
>
> {{TestBlockPlacementPolicyRaid}} access internal {{FSNamesystem}} directly.  
> It cannot be compiled after HDFS-2147.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled

2011-09-08 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-2711:
--

Description: {{TestBlockPlacementPolicyRaid}} access internal 
{{FSNamesystem}} directly.  It cannot be compiled after HDFS-2147.
Environment: (was: {{TestBlockPlacementPolicyRaid}} access internal 
{{FSNamesystem}} directly.  It cannot be compiled after HDFS-2147.)

> TestBlockPlacementPolicyRaid cannot be compiled
> ---
>
> Key: MAPREDUCE-2711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, 
> m2711_20110727.patch, m2711_20110818.patch, m2711_20110908.patch
>
>
> {{TestBlockPlacementPolicyRaid}} access internal {{FSNamesystem}} directly.  
> It cannot be compiled after HDFS-2147.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled

2011-09-08 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-2711:
--

Status: Patch Available  (was: In Progress)

> TestBlockPlacementPolicyRaid cannot be compiled
> ---
>
> Key: MAPREDUCE-2711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
> Environment: {{TestBlockPlacementPolicyRaid}} access internal 
> {{FSNamesystem}} directly.  It cannot be compiled after HDFS-2147.
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, 
> m2711_20110727.patch, m2711_20110818.patch, m2711_20110908.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled

2011-09-08 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-2711:
--

Attachment: m2711_20110908.patch

m2711_20110908.patch: updated with trunk

> TestBlockPlacementPolicyRaid cannot be compiled
> ---
>
> Key: MAPREDUCE-2711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
> Environment: {{TestBlockPlacementPolicyRaid}} access internal 
> {{FSNamesystem}} directly.  It cannot be compiled after HDFS-2147.
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, 
> m2711_20110727.patch, m2711_20110818.patch, m2711_20110908.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Work started] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled

2011-09-08 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-2711 started by Tsz Wo (Nicholas), SZE.

> TestBlockPlacementPolicyRaid cannot be compiled
> ---
>
> Key: MAPREDUCE-2711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
> Environment: {{TestBlockPlacementPolicyRaid}} access internal 
> {{FSNamesystem}} directly.  It cannot be compiled after HDFS-2147.
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, 
> m2711_20110727.patch, m2711_20110818.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-2805) Update RAID for HDFS-2241

2011-08-19 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved MAPREDUCE-2805.
---

   Resolution: Fixed
Fix Version/s: 0.23.0
 Hadoop Flags: [Reviewed]

Thanks Suresh for the review.

Resolving this.

> Update RAID for HDFS-2241
> -
>
> Key: MAPREDUCE-2805
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2805
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: m2805_20110811.patch
>
>
> {noformat}
> src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:44:
>  interface expected here
> [javac] public class RaidBlockSender implements java.io.Closeable, 
> FSConstants {
> [javac]^
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2711) TestBlockPlacementPolicyRaid cannot be compiled

2011-08-18 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-2711:
--

Attachment: m2711_20110818.patch

m2711_20110818.patch: last patch before MAPREDUCE-279 merge.

> TestBlockPlacementPolicyRaid cannot be compiled
> ---
>
> Key: MAPREDUCE-2711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2711
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
> Environment: {{TestBlockPlacementPolicyRaid}} access internal 
> {{FSNamesystem}} directly.  It cannot be compiled after HDFS-2147.
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: m2711_20110719_TestBlockPlacementPolicyRaid.java, 
> m2711_20110727.patch, m2711_20110818.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2805) Update RAID for HDFS-2241

2011-08-11 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082976#comment-13082976
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-2805:
---

Checked it manually.

I have committed it temporarily.  Will wait for a review before resolving this.

> Update RAID for HDFS-2241
> -
>
> Key: MAPREDUCE-2805
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2805
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Reporter: Tsz Wo (Nicholas), SZE
> Attachments: m2805_20110811.patch
>
>
> {noformat}
> src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:44:
>  interface expected here
> [javac] public class RaidBlockSender implements java.io.Closeable, 
> FSConstants {
> [javac]^
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   5   >