[jira] Updated: (MAPREDUCE-1328) contrib/index - modify build / ivy files as appropriate

2009-12-23 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated MAPREDUCE-1328:
---

Issue Type: Bug  (was: Improvement)

> contrib/index  - modify build / ivy files as appropriate 
> -
>
> Key: MAPREDUCE-1328
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1328
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2
>Reporter: Kay Kay
> Fix For: 0.20.2
>
> Attachments: MAPREDUCE-1328.patch
>
>
> The build / ivy.xml files in its current state does not seem to launch 
> successfully due to missing dependencies. 
> Added dependency on : hadoop-core-test / hadoop-hdfs-test . 
> Also the junit classpath is set to include the files retrieved by ivy , 
> specific to the index project. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1067) Default state of queues is undefined when unspecified

2009-12-23 Thread Hemanth Yamijala (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-1067:


Status: Open  (was: Patch Available)

I have some feedback:

- In QueueConfigurationParser.getQueueElement, we have specifically removed the 
check queueState != null when appending a 'state' element to the Queue element. 
It seems like the check is useful for preventing errors. Is there a specific 
reason for doing this ? For e.g. would it not be true that there could be a 
leaf queue given a null State on construction ? We should either prevent it in 
the construction or here, no ?
- bq. Also, state of a container queue should not be allowed to change once the 
hierarchy is built.
I think the way this works is when queue configuration is refreshed, a state 
change for container queues will result in exceptions being thrown when the 
QueueConfigurationParser parses and constructs Queue objects (using setState or 
addChild). This results in a RuntimeException that will be handled only by the 
IPC handler on the server. I think no useful information about the error will 
reach the admin. Can you please verify if this is the case ? If yes, I think it 
must be fixed to be handled more gracefully, and a better error message must be 
given. Same holds for first time construction of hierarchy as well.
- testQueueState should be split into multiple test cases. Typically, it is 
good practice to test one condition / aspect of a unit in one unit test case. 
It makes test case maintenance easier. There are atleast 3 distinct tests I am 
seeing in this test case
- The first part of the test testQueueState is verifying a message that doesn't 
seem correct. For e.g. I think the XML is semantically correct, but since a 
queue state is defined for a container queue, construction fails. Then why are 
we getting an exception whose message is "Malformed xml formation queue tag and 
acls tags or state tags are siblings" ? The code is incorrect if this test is 
passing.
- Shouldn't there be test cases testing refresh of queue state for container 
queues and verify they fail ?
- There should also be a case to test that when a queue is constructed with 
state stopped, and then children are added to it, an exception is thrown. It is 
arguable if this is the right design - i.e. we should probably have 
ContainerQueue as a first class citizen and then we can always prevent an 
invalid state for such an instance, but possibly that's a separate JIRA.

> Default state of queues is undefined when unspecified
> -
>
> Key: MAPREDUCE-1067
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1067
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.21.0
>Reporter: V.V.Chaitanya Krishna
>Assignee: V.V.Chaitanya Krishna
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1067-1.patch, MAPREDUCE-1067-2.patch, 
> MAPREDUCE-1067-3.patch, MAPREDUCE-1067-4.patch, MAPREDUCE-1067-5.patch, 
> MAPREDUCE-1067-6.patch
>
>
> Currently, if the state of a queue is not specified, it is being set to 
> "undefined" state instead of running state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1186) While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir

2009-12-23 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1186:
---

Status: Patch Available  (was: Open)

> While localizing a DistributedCache file, TT sets permissions recursively on 
> the whole base-dir
> ---
>
> Key: MAPREDUCE-1186
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1186-1.txt, patch-1186-2.txt, 
> patch-1186-3-ydist.txt, patch-1186-3-ydist.txt, patch-1186-3.txt, 
> patch-1186-ydist.txt, patch-1186-ydist.txt, patch-1186.txt
>
>
> This is a performance problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1186) While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir

2009-12-23 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1186:
---

Attachment: patch-1186-3.txt

Patch incorporating almost all of review comments, except the following two.

bq. I would like to keep the call for cleanupStorage in 
TaskTracker.initialize(). 
Currently, delete for the local directories happens in two places in 
initialize(). I removed the second one which is essentailly a no-op earlier. 
But now it would delete directories after task-controller's setup.

bq.  Remove LOG.info print in DefaultTaskController.initializeDistributedCache. 
Instead have a LOG.warn in the exception handler.
I have retained LOG.info statement, it would be helpful in debugging. 



> While localizing a DistributedCache file, TT sets permissions recursively on 
> the whole base-dir
> ---
>
> Key: MAPREDUCE-1186
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1186-1.txt, patch-1186-2.txt, 
> patch-1186-3-ydist.txt, patch-1186-3-ydist.txt, patch-1186-3.txt, 
> patch-1186-ydist.txt, patch-1186-ydist.txt, patch-1186.txt
>
>
> This is a performance problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1186) While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir

2009-12-23 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1186:
---

Status: Open  (was: Patch Available)

> While localizing a DistributedCache file, TT sets permissions recursively on 
> the whole base-dir
> ---
>
> Key: MAPREDUCE-1186
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1186-1.txt, patch-1186-2.txt, 
> patch-1186-3-ydist.txt, patch-1186-3-ydist.txt, patch-1186-3.txt, 
> patch-1186-ydist.txt, patch-1186-ydist.txt, patch-1186.txt
>
>
> This is a performance problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-896) Users can set non-writable permissions on temporary files for TT and can abuse disk usage.

2009-12-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-896:
---

Attachment: MR-896.v7.patch

I could reproduce the failure of TestServiceLevelAuthorization consistently by 
adding a sleep in CleanupQueue.deletePath() before actual deletion and making 
the cleanup queue inline solved the issue.
Modified the tests that use MiniMRCluster and validate deletion of files/dirs 
through CleanupQueue to use inline cleanup queue. Inline cleanup queue is moved 
to UtilsForTests.

Attaching new patch for trunk. Please review and provide your comments.

> Users can set non-writable permissions on temporary files for TT and can 
> abuse disk usage.
> --
>
> Key: MAPREDUCE-896
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-896
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Vinod K V
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: MR-896.patch, MR-896.v1.patch, MR-896.v2.patch, 
> MR-896.v3.patch, MR-896.v4.patch, MR-896.v5.patch, MR-896.v6.patch, 
> MR-896.v7.patch, y896.v1.patch, y896.v2.1.fix.patch, y896.v2.1.fix.v1.patch, 
> y896.v2.1.fix.v2.patch, y896.v2.1.patch, y896.v2.patch
>
>
> As of now, irrespective of the TaskController in use, TT itself does a full 
> delete on local files created by itself or job tasks. This step, depending 
> upon TT's umask and the permissions set by files by the user, for e.g in 
> job-work/task-work or child.tmp directories, may or may not go through 
> successful completion fully. Thus is left an opportunity for abusing disk 
> space usage either accidentally or intentionally by TT/users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1295) We need a job trace manipulator to build gridmix runs.

2009-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794309#action_12794309
 ] 

Hadoop QA commented on MAPREDUCE-1295:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/1242/mapreduce-1295--2009-12-23.patch
  against trunk revision 893469.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/242/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/242/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/242/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/242/console

This message is automatically generated.

> We need a job trace manipulator to build gridmix runs.
> --
>
> Key: MAPREDUCE-1295
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1295
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Dick King
>Assignee: Dick King
> Attachments: mapreduce-1295--2009-12-17.patch, 
> mapreduce-1295--2009-12-21.patch, mapreduce-1295--2009-12-22.patch, 
> mapreduce-1295--2009-12-23.patch, mapreduce-1297--2009-12-14.patch
>
>
> Rumen produces "job traces", which are JSON format files describing important 
> aspects of all jobs that are run [successfully or not] on a hadoop map/reduce 
> cluster.  There are two packages under development that will consume these 
> trace files and produce actions in that cluster or another cluster: gridmix3 
> [see jira MAPREDUCE-1124 ] and Mumak [a simulator -- see MAPREDUCE-728 ].
> It would be useful to be able to do two things with job traces, so we can run 
> experiments using these two tools: change the duration, and change the 
> density.  I would like to provide a "folder", a tool that can wrap a 
> long-duration execution trace to redistribute its jobs over a shorter 
> interval, and also change the density by duplicating or culling away jobs 
> from the folded combined job trace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1270) Hadoop C++ Extention

2009-12-23 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794299#action_12794299
 ] 

Zheng Shao commented on MAPREDUCE-1270:
---

Any progress on this?

> Hadoop C++ Extention
> 
>
> Key: MAPREDUCE-1270
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1270
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.20.1
> Environment:  hadoop linux
>Reporter: Wang Shouyan
>
>   Hadoop C++ extension is an internal project in baidu, We start it for these 
> reasons:
>1  To provide C++ API. We mostly use Streaming before, and we also try to 
> use PIPES, but we do not find PIPES is more efficient than Streaming. So we 
> think a new C++ extention is needed for us.
>2  Even using PIPES or Streaming, it is hard to control memory of hadoop 
> map/reduce Child JVM.
>3  It costs so much to read/write/sort TB/PB data by Java. When using 
> PIPES or Streaming, pipe or socket is not efficient to carry so huge data.
>What we want to do: 
>1 We do not use map/reduce Child JVM to do any data processing, which just 
> prepares environment, starts C++ mapper, tells mapper which split it should  
> deal with, and reads report from mapper until that finished. The mapper will 
> read record, ivoke user defined map, to do partition, write spill, combine 
> and merge into file.out. We think these operations can be done by C++ code.
>2 Reducer is similar to mapper, it was started after sort finished, it 
> read from sorted files, ivoke user difined reduce, and write to user defined 
> record writer.
>3 We also intend to rewrite shuffle and sort with C++, for efficience and 
> memory control.
>at first, 1 and 2, then 3.  
>What's the difference with PIPES:
>1 Yes, We will reuse most PIPES code.
>2 And, We should do it more completely, nothing changed in scheduling and 
> management, but everything in execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1295) We need a job trace manipulator to build gridmix runs.

2009-12-23 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1295:
-

Status: Patch Available  (was: Open)

> We need a job trace manipulator to build gridmix runs.
> --
>
> Key: MAPREDUCE-1295
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1295
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Dick King
>Assignee: Dick King
> Attachments: mapreduce-1295--2009-12-17.patch, 
> mapreduce-1295--2009-12-21.patch, mapreduce-1295--2009-12-22.patch, 
> mapreduce-1295--2009-12-23.patch, mapreduce-1297--2009-12-14.patch
>
>
> Rumen produces "job traces", which are JSON format files describing important 
> aspects of all jobs that are run [successfully or not] on a hadoop map/reduce 
> cluster.  There are two packages under development that will consume these 
> trace files and produce actions in that cluster or another cluster: gridmix3 
> [see jira MAPREDUCE-1124 ] and Mumak [a simulator -- see MAPREDUCE-728 ].
> It would be useful to be able to do two things with job traces, so we can run 
> experiments using these two tools: change the duration, and change the 
> density.  I would like to provide a "folder", a tool that can wrap a 
> long-duration execution trace to redistribute its jobs over a shorter 
> interval, and also change the density by duplicating or culling away jobs 
> from the folded combined job trace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1295) We need a job trace manipulator to build gridmix runs.

2009-12-23 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1295:
-

Attachment: mapreduce-1295--2009-12-23.patch

This addresses the issues raised earlier today.

> We need a job trace manipulator to build gridmix runs.
> --
>
> Key: MAPREDUCE-1295
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1295
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Dick King
>Assignee: Dick King
> Attachments: mapreduce-1295--2009-12-17.patch, 
> mapreduce-1295--2009-12-21.patch, mapreduce-1295--2009-12-22.patch, 
> mapreduce-1295--2009-12-23.patch, mapreduce-1297--2009-12-14.patch
>
>
> Rumen produces "job traces", which are JSON format files describing important 
> aspects of all jobs that are run [successfully or not] on a hadoop map/reduce 
> cluster.  There are two packages under development that will consume these 
> trace files and produce actions in that cluster or another cluster: gridmix3 
> [see jira MAPREDUCE-1124 ] and Mumak [a simulator -- see MAPREDUCE-728 ].
> It would be useful to be able to do two things with job traces, so we can run 
> experiments using these two tools: change the duration, and change the 
> density.  I would like to provide a "folder", a tool that can wrap a 
> long-duration execution trace to redistribute its jobs over a shorter 
> interval, and also change the density by duplicating or culling away jobs 
> from the folded combined job trace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794295#action_12794295
 ] 

Hadoop QA commented on MAPREDUCE-1317:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12428801/mapreduce-1317-20091223.patch
  against trunk revision 893469.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/339/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/339/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/339/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/339/console

This message is automatically generated.

> Reducing memory consumption of rumen objects
> 
>
> Key: MAPREDUCE-1317
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Hong Tang
>Assignee: Hong Tang
> Fix For: 0.21.0, 0.22.0
>
> Attachments: mapreduce-1317-20091218.patch, 
> mapreduce-1317-20091222-2.patch, mapreduce-1317-20091222.patch, 
> mapreduce-1317-20091223.patch
>
>
> We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
> very large jobs. The purpose of this jira is to optimze memory consumption of 
> rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-744) Support in DistributedCache to share cache files with other users after HADOOP-4493

2009-12-23 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-744:
--

Attachment: 744-5.patch

bq. 1. DistributedCache.get*Visibiliteies - are deprecated. Why do we add 
deprecated methods?
Ok it looks like we don't have to really add the methods in DistributedCache 
since the methods are used only by the framework. So i moved the accessors 
(getFileVisibilities/getArchiveVisibilities) to TrackerDistributedCacheManager 
where the setters are defined.
This seems like the right thing to do .. 

> Support in DistributedCache to share cache files with other users after 
> HADOOP-4493
> ---
>
> Key: MAPREDUCE-744
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-744
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: tasktracker
>Reporter: Vinod K V
>Assignee: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: 744-1.patch, 744-2.patch, 744-3.patch, 744-4.patch, 
> 744-5.patch, 744-early.patch
>
>
> HADOOP-4493 aims to completely privatize the files distributed to TT via 
> DistributedCache. This jira issues focuses on sharing some/all of these files 
> with all other users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1295) We need a job trace manipulator to build gridmix runs.

2009-12-23 Thread Dick King (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794280#action_12794280
 ] 

Dick King commented on MAPREDUCE-1295:
--

This is a comment on the previous comment timestamped 23/Dec/09 02:59AM.

   * I will make all the {{DeskewedJobTraceReader}} constructors take a 
{{JobTraceReader}} .  I see no reason why {{DeskewedJobTraceReader}} 
constructors shouldn't be public.
   * I will use {{FileSystem::exists}} .
  * I didn't notice that API when I wrote this code.  My bad.  
  * I don't want to require an empty directory because we use the output 
directory as the temp directory if none is supplied, and requiring an empty 
directory would make it impossible to do two runs into the same output 
directory.
   * I'll change the wording on some of the debug messages.  I'll consider 
going to LOG4J .
   * I stand by my decision to use {{deletees}} .  I don't like the 
{{deleteOnExit}} idiom because it makes it hard to use a tool as a callable 
component.



> We need a job trace manipulator to build gridmix runs.
> --
>
> Key: MAPREDUCE-1295
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1295
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Dick King
>Assignee: Dick King
> Attachments: mapreduce-1295--2009-12-17.patch, 
> mapreduce-1295--2009-12-21.patch, mapreduce-1295--2009-12-22.patch, 
> mapreduce-1297--2009-12-14.patch
>
>
> Rumen produces "job traces", which are JSON format files describing important 
> aspects of all jobs that are run [successfully or not] on a hadoop map/reduce 
> cluster.  There are two packages under development that will consume these 
> trace files and produce actions in that cluster or another cluster: gridmix3 
> [see jira MAPREDUCE-1124 ] and Mumak [a simulator -- see MAPREDUCE-728 ].
> It would be useful to be able to do two things with job traces, so we can run 
> experiments using these two tools: change the duration, and change the 
> density.  I would like to provide a "folder", a tool that can wrap a 
> long-duration execution trace to redistribute its jobs over a shorter 
> interval, and also change the density by duplicating or culling away jobs 
> from the folded combined job trace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-23 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1317:
-

Status: Patch Available  (was: Open)

Hudson is broken, retrial.

> Reducing memory consumption of rumen objects
> 
>
> Key: MAPREDUCE-1317
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Hong Tang
>Assignee: Hong Tang
> Fix For: 0.21.0, 0.22.0
>
> Attachments: mapreduce-1317-20091218.patch, 
> mapreduce-1317-20091222-2.patch, mapreduce-1317-20091222.patch, 
> mapreduce-1317-20091223.patch
>
>
> We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
> very large jobs. The purpose of this jira is to optimze memory consumption of 
> rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-23 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1317:
-

Status: Open  (was: Patch Available)

> Reducing memory consumption of rumen objects
> 
>
> Key: MAPREDUCE-1317
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Hong Tang
>Assignee: Hong Tang
> Fix For: 0.21.0, 0.22.0
>
> Attachments: mapreduce-1317-20091218.patch, 
> mapreduce-1317-20091222-2.patch, mapreduce-1317-20091222.patch, 
> mapreduce-1317-20091223.patch
>
>
> We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
> very large jobs. The purpose of this jira is to optimze memory consumption of 
> rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-301) mapred.child.classpath.extension property

2009-12-23 Thread Klaas Bosteels (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaas Bosteels updated MAPREDUCE-301:
-

Assignee: (was: Klaas Bosteels)

I have no need for this anymore, so I won't be implementing it myself any time 
soon.

> mapred.child.classpath.extension property
> -
>
> Key: MAPREDUCE-301
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-301
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Klaas Bosteels
>
> It would be useful to be able to extend the classpath for the task processes 
> on a job per job basis via a {{mapred.child.classpath.extension}} property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-593) org.apache.hadoop.streaming.TestUlimit fails on JRockit 64-bit; not enough memory

2009-12-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794238#action_12794238
 ] 

Steve Loughran commented on MAPREDUCE-593:
--

Since Oracle stopped making new releases of the JRockit JVM public, and the old 
version not up to date w/r/t security patches, I've stopped using JRockit. So I 
can no longer replicate this. 

That doesn't mean that the ulimit should not be overrideable, as the problem 
may arise again in future. Only that we won't be able to check any fix works. 

Why not downgrade to WONTFIX or WORKSFORME until someone else re-encounters the 
problem?

> org.apache.hadoop.streaming.TestUlimit fails on JRockit 64-bit; not enough 
> memory
> -
>
> Key: MAPREDUCE-593
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-593
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
> Environment: Linux morzine 2.6.22-15-generic #1 SMP Fri Jul 11 
> 18:56:36 UTC 2008 x86_64 GNU/Linux
> java version "1.6.0_02"
> Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
> BEA JRockit(R) (build R27.4.0-90-89592-1.6.0_02-20070928-1715-linux-x86_64, 
> compiled mode)
>Reporter: Steve Loughran
>
> the testUlimit test sets a memory limit that is too small for Java to start. 
> So it fails with a -1 response instead, which breaks the test. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-12-23 Thread BitsOfInfo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794127#action_12794127
 ] 

BitsOfInfo commented on MAPREDUCE-1176:
---

On second thought, I am thinking it might be better for this new 
FixedLengthInputFormat config property to work as follows which gives greater 
control over what can be used as the ByteWritable KEY value.

Permit the user to specify the start and end positions within a record which 
define they key such as

FixedLengthInputFormat.defineKeyBoundaries(long start, long end)

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
> MAPREDUCE-1176-v3.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1328) contrib/index - modify build / ivy files as appropriate

2009-12-23 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated MAPREDUCE-1328:
---

Status: In Progress  (was: Patch Available)

> contrib/index  - modify build / ivy files as appropriate 
> -
>
> Key: MAPREDUCE-1328
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1328
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.2
>Reporter: Kay Kay
> Fix For: 0.20.2
>
> Attachments: MAPREDUCE-1328.patch
>
>
> The build / ivy.xml files in its current state does not seem to launch 
> successfully due to missing dependencies. 
> Added dependency on : hadoop-core-test / hadoop-hdfs-test . 
> Also the junit classpath is set to include the files retrieved by ivy , 
> specific to the index project. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1320) StringBuffer -> StringBuilder occurence

2009-12-23 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated MAPREDUCE-1320:
---

Status: In Progress  (was: Patch Available)

> StringBuffer -> StringBuilder occurence 
> 
>
> Key: MAPREDUCE-1320
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1320
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1
>Reporter: Kay Kay
> Fix For: 0.20.2
>
> Attachments: MAPREDUCE-1320.patch
>
>
> A good number of toString() implementations use StringBuffer when the 
> reference clearly does not go out of scope of the method and no concurrency 
> is needed. Patch contains replacing those occurences from StringBuffer to 
> StringBuilder. 
> Created against map/reduce project trunk . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1318) Document exit codes and their meanings used by linux task controller

2009-12-23 Thread Anatoli Fomenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Fomenko updated MAPREDUCE-1318:
---

Attachment: MAPREDUCE-1318.2.patch

New patch version is attached, with the following changes:
1. The added lines are in the class javadoc instead of command javadoc of 
./Hadoop-MapReduce/src/java/org/apache/hadoop/mapred/LinuxTaskController.java
2. The same information added to 
./Hadoop-MapReduce/src/docs/src/documentation/content/xdocs/cluster_setup.xml


> Document exit codes and their meanings used by linux task controller
> 
>
> Key: MAPREDUCE-1318
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1318
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sreekanth Ramakrishnan
>Assignee: Anatoli Fomenko
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: HADOOP-5912.1.patch, MAPREDUCE-1318.1.patch, 
> MAPREDUCE-1318.2.patch
>
>
> Currently, linux task controller binary uses a set of exit code, which is not 
> documented. These should be documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-12-23 Thread BitsOfInfo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794084#action_12794084
 ] 

BitsOfInfo commented on MAPREDUCE-1176:
---

Chris, thanks for the comments, to address these:


>>This should offer fixed key/value bytes, not just value bytes. 
>>The value type should be BytesWritable, not Text.

To clarify, so I'll modify this to change the KEY and VAL to be , then propose to add a config property to the 
FixedLengthInputFormat to allow someone to configure the "prefix" number of 
value (record) bytes to use as the KEY, such as 
"mapreduce.input.fixedlengthinputformat.keyprefixcount" or "keyprefixbytes" 
etc. Please send over any suggestions or other ideas.



>> The double arithmetic should be replaced by modular arithmetic.

Will change to:
{code}
// determine the split size, it should be as close as possible to the 
// default size, but should NOT split within a record... each split
// should contain a complete set of records with the first record
// starting at the first byte in the split and the last record ending
// with the last byte in the split.
long splitSize =  (defaultSize / recordLength) * recordLength;
{code}



>> isSplittable need only verify that the file is not compressed, not that 
>> recordLength is sane.

Moving the record length config property validation to "getSplits()" instead


>> Reuse the key/value types- reading directly into them- rather than 
>> allocating a new byte array for each record

Will do


>>Please clean up unused/overly general imports.

Will do


>> Remove member fields that are not used after initialization.

I identified two of these in FixedLengthRecordReader, will remove




> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1, 0.20.2
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
> MAPREDUCE-1176-v3.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete

2009-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794068#action_12794068
 ] 

Hadoop QA commented on MAPREDUCE-1305:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428316/MAPREDUCE-1305.patch
  against trunk revision 893469.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/241/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/241/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/241/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/241/console

This message is automatically generated.

> Massive performance problem with DistCp and -delete
> ---
>
> Key: MAPREDUCE-1305
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Peter Romianowski
>Assignee: Peter Romianowski
> Attachments: MAPREDUCE-1305.patch
>
>
> *First problem*
> In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus 
> objects when the path is all we need.
> The performance problem comes from 
> org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries 
> to retrieve file permissions by issuing a "ls -ld " which is painfully 
> slow.
> Changed that to just serialize Path and not FileStatus.
> *Second problem*
> To delete the files we invoke the "hadoop" command line tool with option 
> "-rmr ". Again, for each file.
> Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-64) Map-side sort is hampered by io.sort.record.percent

2009-12-23 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-64:
---

Status: Open  (was: Patch Available)

The test case needs to be fixed.

It assumes the vint length of the key will match the vint length of the key 
data, but this isn't true for some values, like 128 (127 byte record + 1 vint 
byte; test checks against 2 vint bytes for full, 128 byte record length).

> Map-side sort is hampered by io.sort.record.percent
> ---
>
> Key: MAPREDUCE-64
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-64
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun C Murthy
>Assignee: Chris Douglas
> Attachments: M64-0.patch, M64-0i.png, M64-1.patch, M64-1i.png, 
> M64-2.patch, M64-2i.png, M64-3.patch, M64-4.patch, M64-5.patch
>
>
> Currently io.sort.record.percent is a fairly obscure, per-job configurable, 
> expert-level parameter which controls how much accounting space is available 
> for records in the map-side sort buffer (io.sort.mb). Typically values for 
> io.sort.mb (100) and io.sort.record.percent (0.05) imply that we can store 
> ~350,000 records in the buffer before necessitating a sort/combine/spill.
> However for many applications which deal with small records e.g. the 
> world-famous wordcount and it's family this implies we can only use 5-10% of 
> io.sort.mb i.e. (5-10M) before we spill inspite of having _much_ more memory 
> available in the sort-buffer. The word-count for e.g. results in ~12 spills 
> (given hdfs block size of 64M). The presence of a combiner exacerbates the 
> problem by piling serialization/deserialization of records too...
> Sure, jobs can configure io.sort.record.percent, but it's tedious and 
> obscure; we really can do better by getting the framework to automagically 
> pick it by using all available memory (upto io.sort.mb) for either the data 
> or accounting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1009) Forrest documentation needs to be updated to describes features provided for supporting hierarchical queues

2009-12-23 Thread Hemanth Yamijala (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-1009:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

The test failures were unrelated to this documentation patch and are being 
tracked in MAPREDUCE-1311 and MAPREDUCE-1312.

Hence, I committed this patch to trunk and branch 0.21. Thanks, Vinod !

> Forrest documentation needs to be updated to describes features provided for 
> supporting hierarchical queues
> ---
>
> Key: MAPREDUCE-1009
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1009
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.21.0
>Reporter: Hemanth Yamijala
>Assignee: Vinod K V
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1009-20091008.txt, 
> MAPREDUCE-1009-20091116.txt, MAPREDUCE-1009-20091124.txt, 
> MAPREDUCE-1009-20091211.txt, MAPREDUCE-1009-20091217.txt, 
> MAPREDUCE-1009-20091222.txt
>
>
> Forrest documentation must be updated for describing how to set up and use 
> hierarchical queues in the framework and the capacity scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1059) distcp can generate uneven map task assignments

2009-12-23 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1059:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1

I committed this. Thanks, Aaron!

> distcp can generate uneven map task assignments
> ---
>
> Key: MAPREDUCE-1059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-1059.2.patch, MAPREDUCE-1059.3.patch, 
> MAPREDUCE-1059.patch
>
>
> distcp writes out a SequenceFile containing the source files to transfer, and 
> their sizes. Map tasks are created over spans of this file, representing 
> files which each mapper should transfer. In practice, some transfer loads 
> yield many empty map tasks and a few tasks perform the bulk of the work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-23 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793994#action_12793994
 ] 

Hong Tang commented on MAPREDUCE-1317:
--

The 2 failed unit tests in rumen were caused by my false assumption that 
LoggedXXX objects are immutable - while in fact the HadoopLogAnalyzer actually 
mutates the List object returned from the getter method. I 
restore the original semantics by creating an empty list instead of using 
Collections.emptyList().

I filed MAPREDUCE-1330 to propose to make LoggedXXX APIs more consistent in 
this regard.

> Reducing memory consumption of rumen objects
> 
>
> Key: MAPREDUCE-1317
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Hong Tang
>Assignee: Hong Tang
> Fix For: 0.21.0, 0.22.0
>
> Attachments: mapreduce-1317-20091218.patch, 
> mapreduce-1317-20091222-2.patch, mapreduce-1317-20091222.patch, 
> mapreduce-1317-20091223.patch
>
>
> We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
> very large jobs. The purpose of this jira is to optimze memory consumption of 
> rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1330) [rumen] LoggedXXX objects should either be completely mutable or immutable

2009-12-23 Thread Hong Tang (JIRA)
[rumen] LoggedXXX objects should either be completely mutable or immutable
--

 Key: MAPREDUCE-1330
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1330
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Hong Tang


The current APIs of LoggedXXX objects in rumen do not allow the objects to be 
modified through setters (package private), but opens the door to modify the 
object by allowing callers to mutate the list objects returned from the 
getters. This is confusing and based on the nature of such objects, it is 
probably a good idea to make the immutable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-23 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1317:
-

Attachment: mapreduce-1317-20091223.patch

New patch that fixes the failed unit test.

> Reducing memory consumption of rumen objects
> 
>
> Key: MAPREDUCE-1317
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Hong Tang
>Assignee: Hong Tang
> Fix For: 0.21.0, 0.22.0
>
> Attachments: mapreduce-1317-20091218.patch, 
> mapreduce-1317-20091222-2.patch, mapreduce-1317-20091222.patch, 
> mapreduce-1317-20091223.patch
>
>
> We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
> very large jobs. The purpose of this jira is to optimze memory consumption of 
> rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-23 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1317:
-

Status: Patch Available  (was: Open)

> Reducing memory consumption of rumen objects
> 
>
> Key: MAPREDUCE-1317
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Hong Tang
>Assignee: Hong Tang
> Fix For: 0.21.0, 0.22.0
>
> Attachments: mapreduce-1317-20091218.patch, 
> mapreduce-1317-20091222-2.patch, mapreduce-1317-20091222.patch, 
> mapreduce-1317-20091223.patch
>
>
> We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
> very large jobs. The purpose of this jira is to optimze memory consumption of 
> rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1317) Reducing memory consumption of rumen objects

2009-12-23 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1317:
-

Status: Open  (was: Patch Available)

Fixing bugs found through unit tests.

> Reducing memory consumption of rumen objects
> 
>
> Key: MAPREDUCE-1317
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Hong Tang
>Assignee: Hong Tang
> Fix For: 0.21.0, 0.22.0
>
> Attachments: mapreduce-1317-20091218.patch, 
> mapreduce-1317-20091222-2.patch, mapreduce-1317-20091222.patch
>
>
> We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with 
> very large jobs. The purpose of this jira is to optimze memory consumption of 
> rumen produced job objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1009) Forrest documentation needs to be updated to describes features provided for supporting hierarchical queues

2009-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793989#action_12793989
 ] 

Hadoop QA commented on MAPREDUCE-1009:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12428714/MAPREDUCE-1009-20091222.txt
  against trunk revision 893409.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/337/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/337/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/337/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/337/console

This message is automatically generated.

> Forrest documentation needs to be updated to describes features provided for 
> supporting hierarchical queues
> ---
>
> Key: MAPREDUCE-1009
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1009
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.21.0
>Reporter: Hemanth Yamijala
>Assignee: Vinod K V
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1009-20091008.txt, 
> MAPREDUCE-1009-20091116.txt, MAPREDUCE-1009-20091124.txt, 
> MAPREDUCE-1009-20091211.txt, MAPREDUCE-1009-20091217.txt, 
> MAPREDUCE-1009-20091222.txt
>
>
> Forrest documentation must be updated for describing how to set up and use 
> hierarchical queues in the framework and the capacity scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1293) AutoInputFormat doesn't work with non-default FileSystems

2009-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793987#action_12793987
 ] 

Hadoop QA commented on MAPREDUCE-1293:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427964/MAPREDUCE-1293.txt
  against trunk revision 893409.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/240/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/240/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/240/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/240/console

This message is automatically generated.

> AutoInputFormat doesn't work with non-default FileSystems
> -
>
> Key: MAPREDUCE-1293
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1293
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Reporter: Andrew Hitchcock
> Attachments: MAPREDUCE-1293.txt
>
>
> AutoInputFormat uses the wrong FileSystem.get() method when getting a 
> reference to a FileSystem object. AutoInputFormat gets the default 
> FileSystem, so this method breaks if the InputSplit's path is pointing to a 
> different FileSystem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1286) Quotes in environment HADOOP_CLIENT_OPTS confuse parsing if this env is concatenated with something else

2009-12-23 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1286:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

The double quotes were added intentionally in MAPREDUCE-1086, but removing them 
looks OK.

+1 I committed this. Thanks, Yuri!

> Quotes in environment HADOOP_CLIENT_OPTS confuse parsing if this env is 
> concatenated with something else
> 
>
> Key: MAPREDUCE-1286
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1286
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.20.2
> Environment: Linux F12; streaming
>Reporter: Yuri Pradkin
>Assignee: Yuri Pradkin
> Attachments: mr-1286
>
>
> I use streaming and in the perl-reducer I write to hdfs using  a pipe to hdfs 
> -put - It turns out that because TaskRunner sets the environment 
> HADOOP_CLIENT_OPTS in double quotes, when hdfs shell script concatenates 
> these with something else, the command fails: .e.g java -Dblah=x -Dfoo=y 
> "-Dhadoop.tasklog.taskid=z -Dhadoop.tasklog.totalLogFileSize=s"...
> Since I don't see any reason to have these double quotes in the original 
> code; I propose they're removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.