[jira] Updated: (MAPREDUCE-1505) Cluster class should create the rpc client only when needed

2010-04-06 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1505:
-

Attachment: MAPREDUCE-1505_yhadoop20_9.patch

Patch for an older version of yahoo-hadoop-0.20, not for commit.

> Cluster class should create the rpc client only when needed
> ---
>
> Key: MAPREDUCE-1505
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1505
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.20.2
>Reporter: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1505_yhadoop20.patch, 
> MAPREDUCE-1505_yhadoop20_9.patch
>
>
> It will be good to have the org.apache.hadoop.mapreduce.Cluster create the 
> rpc client object only when needed (when a call to the jobtracker is actually 
> required). org.apache.hadoop.mapreduce.Job constructs the Cluster object 
> internally and in many cases the application that created the Job object 
> really wants to look at the configuration only. It'd help to not have these 
> connections to the jobtracker especially when Job is used in the tasks (for 
> e.g., Pig calls mapreduce.FileInputFormat.setInputPath in the tasks and that 
> requires a Job object to be passed).
> In Hadoop 20, the Job object internally creates the JobClient object, and the 
> same argument applies there too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1682) Tasks should not be scheduled after tip is killed/failed.

2010-04-06 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854339#action_12854339
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1682:


A quick look at the JobInProgress code says "In 
JobInProgress.findSpeculativeTask(), tip.isRunnable() check is not done,
whereas the method JobInProgress.findTaskFromList() does the check to skip 
failed/killed tips".

The bug is not there in 0.21 or trunk (got fixed in HADOOP:2141), it is there 
only in branch 0.20.

> Tasks should not be scheduled after tip is killed/failed.
> -
>
> Key: MAPREDUCE-1682
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1682
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amareshwari Sriramadasu
> Fix For: 0.20.3
>
>
> We have seen the following scenario in our cluster:
> A job got marked failed, because four attempts of a TIP failed. This would 
> kill all the map and reduce tips. Then a job-cleanup attempt is launched.
> The job-cleanup attempt failed because it could not report status for 10 
> minutes. There are 3 such job-cleanup attempts leading the job to get killed 
> after 1/2 hour.
> While waiting for the job cleanup to finish, JobTracker scheduled many tasks 
> of the job on TaskTrackers and sent a KillTaskAction in the next heartbeat. 
> This is just wasting lots of resources, we should avoid scheduling tasks of a 
> tip once the tip is killed/failed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1682) Tasks should not be scheduled after tip is killed/failed.

2010-04-06 Thread Amareshwari Sriramadasu (JIRA)
Tasks should not be scheduled after tip is killed/failed.
-

 Key: MAPREDUCE-1682
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1682
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amareshwari Sriramadasu
 Fix For: 0.20.3


We have seen the following scenario in our cluster:
A job got marked failed, because four attempts of a TIP failed. This would kill 
all the map and reduce tips. Then a job-cleanup attempt is launched.
The job-cleanup attempt failed because it could not report status for 10 
minutes. There are 3 such job-cleanup attempts leading the job to get killed 
after 1/2 hour.
While waiting for the job cleanup to finish, JobTracker scheduled many tasks of 
the job on TaskTrackers and sent a KillTaskAction in the next heartbeat. 

This is just wasting lots of resources, we should avoid scheduling tasks of a 
tip once the tip is killed/failed.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1637) Create a test for API compatibility between releases

2010-04-06 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1637:
-

Issue Type: Sub-task  (was: Test)
Parent: MAPREDUCE-1681

> Create a test for API compatibility between releases
> 
>
> Key: MAPREDUCE-1637
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1637
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: build, test
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
>
> We should have an automated test (or a set of tests) for checking that 
> programs written against an old version of the API still run with a newer 
> version. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package

2010-04-06 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1623:
-

Issue Type: Sub-task  (was: Improvement)
Parent: MAPREDUCE-1681

> Apply audience and stability annotations to classes in mapred package
> -
>
> Key: MAPREDUCE-1623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
> Attachments: MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch
>
>
> There are lots of implementation classes in org.apache.hadoop.mapred which 
> makes it difficult to see the user-level MapReduce API classes in the 
> Javadoc. (See 
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html
>  for example.) By marking these implementation classes with the 
> InterfaceAudience.Private annotation we can exclude them from user Javadoc 
> (using HADOOP-6658).
> Later work will move the implementation classes into o.a.h.mapreduce.server 
> and related packages (see MAPREDUCE-561), but applying the annotations is a 
> good first step. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1625) Improve grouping of packages in Javadoc

2010-04-06 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1625:
-

Issue Type: Sub-task  (was: Improvement)
Parent: MAPREDUCE-1681

> Improve grouping of packages in Javadoc
> ---
>
> Key: MAPREDUCE-1625
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1625
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
> Attachments: MAPREDUCE-1625.patch, MAPREDUCE-1625.patch
>
>
> There are a couple of problems with the current Javadoc:
> * The main MapReduce package documentation on the index page appears under 
> "Other Packages" below the fold.
> * Some contrib classes and packages are interspersed in the main MapReduce 
> documentation, which is very confusing for users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1650) Exclude Private elements from generated MapReduce Javadoc

2010-04-06 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1650:
-

Issue Type: Sub-task  (was: Improvement)
Parent: MAPREDUCE-1681

> Exclude Private elements from generated MapReduce Javadoc
> -
>
> Key: MAPREDUCE-1650
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1650
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
> Attachments: MAPREDUCE-1650.patch, MAPREDUCE-1650.patch
>
>
> Exclude elements annotated with InterfaceAudience.Private or 
> InterfaceAudience.LimitedPrivate from Javadoc and JDiff.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1681) MapReduce API compatibility

2010-04-06 Thread Tom White (JIRA)
MapReduce API compatibility
---

 Key: MAPREDUCE-1681
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1681
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, documentation
Reporter: Tom White
Priority: Blocker
 Fix For: 0.21.0


This is an umbrella issue to document and test MapReduce API compatibility 
across releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1594) Support for Sleep Jobs in gridmix

2010-04-06 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854293#action_12854293
 ] 

Hong Tang commented on MAPREDUCE-1594:
--

The latest patch ("1594-yhadoop-20-1xx-1-4.patch") still has a few minor 
issues. I fixed them and attached a new patch (1594-yhadoop-20-1xx-1-5.patch). 
I also attached a separated diff from 1594-yhadoop-20-1xx-1-4.patch 
(1594-diff-4-5.patch, so that it is easier to see what I have changed):

- Replacing the pattern of "enum.name().equals()" to directly enum comparison 
in various places.
- In GridmixJob, made job field final. Cleaned up the exception logic of 
PrivilegedExceptionAction.run().
- Move the functionality of setting SeqId and original name in GridmixJob ctor, 
eliminating the need of setSeqId and setJobId.
- Added another pullDescription method in GridmixJob that takes a JobContext 
object. (to avoid expose the key "gridmix.job.seq").
- The way RINTERVAL is calculated is wrong - when duration is less than 2, 
it would lead to RINTERVAL==0. Per offline conversation with Chris, it seems 
that having a fine granularity for RINTERVAL is pointless since progress is 
only updated when TT sends heartbeat to JT. So I reverted the way how RINTERVAL 
is calculated.
- Changed to use System.currentTimeMilis() in SleepJob instead of 
System.nanoTime() since we are dealing with mili-sec granularity.
- In both TestGridmixSubmission and TestSleepJob, the statements for setting 
the configuration seems to be useless.
- Some indentation issues in GenerateData.
- Removed unused imports in GridmixJob
- Removed the unused LoadJob ctor.

> Support for Sleep Jobs in gridmix
> -
>
> Key: MAPREDUCE-1594
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1594
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Reporter: rahul k singh
> Attachments: 1376-5-yhadoop20-100-3.patch, 1594-diff-4-5.patch, 
> 1594-yhadoop-20-1xx-1-2.patch, 1594-yhadoop-20-1xx-1-3.patch, 
> 1594-yhadoop-20-1xx-1-4.patch, 1594-yhadoop-20-1xx-1-5.patch, 
> 1594-yhadoop-20-1xx-1.patch, 1594-yhadoop-20-1xx.patch
>
>
> Support for Sleep jobs in gridmix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1594) Support for Sleep Jobs in gridmix

2010-04-06 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1594:
-

Attachment: 1594-yhadoop-20-1xx-1-5.patch

> Support for Sleep Jobs in gridmix
> -
>
> Key: MAPREDUCE-1594
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1594
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Reporter: rahul k singh
> Attachments: 1376-5-yhadoop20-100-3.patch, 1594-diff-4-5.patch, 
> 1594-yhadoop-20-1xx-1-2.patch, 1594-yhadoop-20-1xx-1-3.patch, 
> 1594-yhadoop-20-1xx-1-4.patch, 1594-yhadoop-20-1xx-1-5.patch, 
> 1594-yhadoop-20-1xx-1.patch, 1594-yhadoop-20-1xx.patch
>
>
> Support for Sleep jobs in gridmix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1594) Support for Sleep Jobs in gridmix

2010-04-06 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1594:
-

Attachment: 1594-diff-4-5.patch

> Support for Sleep Jobs in gridmix
> -
>
> Key: MAPREDUCE-1594
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1594
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Reporter: rahul k singh
> Attachments: 1376-5-yhadoop20-100-3.patch, 1594-diff-4-5.patch, 
> 1594-yhadoop-20-1xx-1-2.patch, 1594-yhadoop-20-1xx-1-3.patch, 
> 1594-yhadoop-20-1xx-1-4.patch, 1594-yhadoop-20-1xx-1-5.patch, 
> 1594-yhadoop-20-1xx-1.patch, 1594-yhadoop-20-1xx.patch
>
>
> Support for Sleep jobs in gridmix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

2010-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854287#action_12854287
 ] 

Hadoop QA commented on MAPREDUCE-1073:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch
  against trunk revision 931274.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/console

This message is automatically generated.

> Progress reported for pipes tasks is incorrect.
> ---
>
> Key: MAPREDUCE-1073
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: pipes
>Affects Versions: 0.20.1
>Reporter: Sreekanth Ramakrishnan
> Attachments: mapreduce-1073--2010-03-31.patch, 
> mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, 
> {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader, 
> OutputCollector, Reporter)}} we do the following:
> {code}
> while (input.next(key, value)) {
>   downlink.mapItem(key, value);
>   if(skipping) {
> downlink.flush();
>   }
> }
> {code}
> This would result in consumption of all the records for current task and 
> taking task progress to 100% whereas the actual pipes application would be 
> trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1680) Add a metrics to track the number of heartbeats processed

2010-04-06 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-1680:


Assignee: Dick King

> Add a metrics to track the number of heartbeats processed
> -
>
> Key: MAPREDUCE-1680
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1680
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Hong Tang
>Assignee: Dick King
>
> It would be nice to add a metrics that tracks the number of heartbeats 
> processed by JT.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package

2010-04-06 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1623:
-

Attachment: MAPREDUCE-1623.patch

Here's a new patch which covers all the core packages in MapReduce. The details 
are in the patch, but the following table gives a summary:

||Package org.apache.hadoop.|| Visibility & Stability || Notes ||
|filecache|deprecated| |
|mapred|deprecated or private unstable (for implementation classes)| |
|mapred.lib|deprecated| |
|mapred.pipes|public stable| |
|mapred.tools|public stable| |
|mapreduce|public evolving| |
|mapreduce.filecache|deprecated/private| |
|mapreduce.jobhistory|private unstable| |
|mapreduce.protocol|private stable| |
|mapreduce.security|private unstable/public evolving| |
|mapreduce.security.token*|private unstable| |
|mapreduce.server.jobtracker|private unstable| |
|mapreduce.server.tasktracker|private unstable| |
|mapreduce.split|private unstable| |
|mapreduce.task*|private unstable| |
|mapreduce.tools|public stable| |
|mapreduce.util|private unstable|All the util classes are for the framework, so 
shouldn't be public|
|util|deprecated| |

With this patch, MAPREDUCE-1650, and MAPREDUCE-1625, the public Javadoc looks 
like this: http://people.apache.org/~tomwhite/MAPREDUCE-1623/api/







> Apply audience and stability annotations to classes in mapred package
> -
>
> Key: MAPREDUCE-1623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
> Attachments: MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch
>
>
> There are lots of implementation classes in org.apache.hadoop.mapred which 
> makes it difficult to see the user-level MapReduce API classes in the 
> Javadoc. (See 
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html
>  for example.) By marking these implementation classes with the 
> InterfaceAudience.Private annotation we can exclude them from user Javadoc 
> (using HADOOP-6658).
> Later work will move the implementation classes into o.a.h.mapreduce.server 
> and related packages (see MAPREDUCE-561), but applying the annotations is a 
> good first step. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1680) Add a metrics to track the number of heartbeats processed

2010-04-06 Thread Hong Tang (JIRA)
Add a metrics to track the number of heartbeats processed
-

 Key: MAPREDUCE-1680
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1680
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Hong Tang


It would be nice to add a metrics that tracks the number of heartbeats 
processed by JT.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-469) Support concatenated gzip and bzip2 files

2010-04-06 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854268#action_12854268
 ] 

David Ciemiewicz commented on MAPREDUCE-469:


Unfortunately I discovered that concatenated bzip2 files did not work in 
Map-Reduce until *AFTER* I went and concatenated 3TB and over 250K compressed 
files.

A colleague suggested that I "fix" my data using the following approach:

hadoop dfs -cat X | bunzip2 | bzip2 | hadoop dfs -put - X.new

I tried this with a 3GB single file concatenation of multiple bzip2 compressed 
files.

This process took just over an hour with compression taking 5-6X longer than 
decompression (as measured in CPU utilization).

It only took several minutes to concatenate the multiple part files into a 
single file.


I think that this points out that decompressing and recompressing data is not 
really a viable solution for creating large concatenations of smaller files.

The best performing solution is to create the smaller part files in parallel 
with a bunch of reducers, then concatenate them later into one (or several) 
larger files.

And so fixing Hadoop Map Reduce to be able to read concatenations of files is 
actually probably the highest return on investment by the community.




> Support concatenated gzip and bzip2 files
> -
>
> Key: MAPREDUCE-469
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-469
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tom White
>Assignee: Ravi Gummadi
>
> When running MapReduce with concatenated gzip files as input only the first 
> part is read, which is confusing, to say the least. Concatenated gzip is 
> described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage 
> and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at 
> http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-5) Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException

2010-04-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854266#action_12854266
 ] 

Allen Wittenauer commented on MAPREDUCE-5:
--

If these messages are harmful, then we need to give a better message.

If these messages aren't harmful, then we shouldn't be throwing a java 
exception.

FWIW, we see these all the time at LinkedIn.

> Shuffle's getMapOutput() fails with EofException, followed by 
> IllegalStateException
> ---
>
> Key: MAPREDUCE-5
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2
> Environment: Sun Java 1.6.0_13, OpenSolaris, running on a SunFire 
> 4150 (x64) 10 node cluster
>Reporter: George Porter
>
> During the shuffle phase, I'm seeing a large sequence of the following 
> actions:
> 1) WARN org.apache.hadoop.mapred.TaskTracker: 
> getMapOutput(attempt_200905181452_0002_m_10_0,0) failed : 
> org.mortbay.jetty.EofException
> 2) WARN org.mortbay.log: Committed before 410 
> getMapOutput(attempt_200905181452_0002_m_10_0,0) failed : 
> org.mortbay.jetty.EofException
> 3) ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException: 
> Committed
> The map phase completes with 100%, and then the reduce phase crawls along 
> with the above errors in each of the TaskTracker logs.  None of the 
> tasktrackers get lost.  When I run non-data jobs like the 'pi' test from the 
> example jar, everything works fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-5) Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException

2010-04-06 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5:
-

Affects Version/s: 0.20.2

> Shuffle's getMapOutput() fails with EofException, followed by 
> IllegalStateException
> ---
>
> Key: MAPREDUCE-5
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2
> Environment: Sun Java 1.6.0_13, OpenSolaris, running on a SunFire 
> 4150 (x64) 10 node cluster
>Reporter: George Porter
>
> During the shuffle phase, I'm seeing a large sequence of the following 
> actions:
> 1) WARN org.apache.hadoop.mapred.TaskTracker: 
> getMapOutput(attempt_200905181452_0002_m_10_0,0) failed : 
> org.mortbay.jetty.EofException
> 2) WARN org.mortbay.log: Committed before 410 
> getMapOutput(attempt_200905181452_0002_m_10_0,0) failed : 
> org.mortbay.jetty.EofException
> 3) ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException: 
> Committed
> The map phase completes with 100%, and then the reduce phase crawls along 
> with the above errors in each of the TaskTracker logs.  None of the 
> tasktrackers get lost.  When I run non-data jobs like the 'pi' test from the 
> example jar, everything works fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1679) capacity scheduler's user-limit documentation is not helpful

2010-04-06 Thread Allen Wittenauer (JIRA)
capacity scheduler's user-limit documentation is not helpful


 Key: MAPREDUCE-1679
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1679
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/capacity-sched
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Allen Wittenauer
Priority: Trivial


The example given for the user limit tunable doesn't actually show how that 
value comes into play.  With 4 users, the Max() is 25 for both the user limit 
and the capacity limit (from my reading of the source).  Either pushing the 
example to 5 users or raising the user limit to something higher than 25 would 
help a great deal.  Also, presenting this info in tabular format showing how 
the max() value is in play would also be great.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1678) Change org.apache.hadoop.mapreduce.Cluster methods to allow for extending

2010-04-06 Thread Kyle Ellrott (JIRA)
Change org.apache.hadoop.mapreduce.Cluster methods to allow for extending
-

 Key: MAPREDUCE-1678
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1678
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission
Reporter: Kyle Ellrott
Priority: Trivial


Change methods in org.apache.hadoop.mapreduce.Cluster from private to protected 
to allow extension of cluster.
If the method createRPCProxy is changed from private to protected, then 
alternate cluster implementations could be written that return other 
ClientProtocol's. 
For example, changing the protocol some custom implementation called 
SimpleClient
 
ie:
public class SimpleCluster extends Cluster {
  @Override
  protected ClientProtocol createRPCProxy(InetSocketAddress addr, Configuration 
conf) throws IOException {
return new SimpleClient(conf);
  } 
}




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1466) FileInputFormat should save #input-files in JobConf

2010-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854245#action_12854245
 ] 

Hudson commented on MAPREDUCE-1466:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #302 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/302/])
. Record number of files processed in FileInputFormat in the
Configuration for offline analysis. Contributed by Luke Lu and Arun Murthy


> FileInputFormat should save #input-files in JobConf
> ---
>
> Key: MAPREDUCE-1466
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1466
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.22.0
>Reporter: Arun C Murthy
>Assignee: Luke Lu
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1466_yhadoop20-1.patch, 
> MAPREDUCE-1466_yhadoop20-2.patch, MAPREDUCE-1466_yhadoop20-3.patch, 
> MAPREDUCE-1466_yhadoop20.patch, mr-1466-trunk-v1.patch, 
> mr-1466-trunk-v2.patch, mr-1466-trunk-v3.patch, mr-1466-trunk-v4.patch, 
> mr-1466-trunk-v5.patch
>
>
> We already track the amount of data consumed by MR applications 
> (MAP_INPUT_BYTES), alongwith, it would be useful to #input-files from the 
> client-side for analysis. Along the lines of MAPREDUCE-1403, it would be easy 
> to stick in the JobConf during job-submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1466) FileInputFormat should save #input-files in JobConf

2010-04-06 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1466:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1

I committed this. Thanks, Luke & Arun!

> FileInputFormat should save #input-files in JobConf
> ---
>
> Key: MAPREDUCE-1466
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1466
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.22.0
>Reporter: Arun C Murthy
>Assignee: Luke Lu
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1466_yhadoop20-1.patch, 
> MAPREDUCE-1466_yhadoop20-2.patch, MAPREDUCE-1466_yhadoop20-3.patch, 
> MAPREDUCE-1466_yhadoop20.patch, mr-1466-trunk-v1.patch, 
> mr-1466-trunk-v2.patch, mr-1466-trunk-v3.patch, mr-1466-trunk-v4.patch, 
> mr-1466-trunk-v5.patch
>
>
> We already track the amount of data consumed by MR applications 
> (MAP_INPUT_BYTES), alongwith, it would be useful to #input-files from the 
> client-side for analysis. Along the lines of MAPREDUCE-1403, it would be easy 
> to stick in the JobConf during job-submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1637) Create a test for API compatibility between releases

2010-04-06 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1637:
-

Priority: Blocker  (was: Major)
Assignee: Tom White

> Create a test for API compatibility between releases
> 
>
> Key: MAPREDUCE-1637
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1637
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: build, test
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
>
> We should have an automated test (or a set of tests) for checking that 
> programs written against an old version of the API still run with a newer 
> version. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1672) Create test scenario for "distributed cache file behaviour, when dfs file is not modified"

2010-04-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854222#action_12854222
 ] 

Konstantin Boudnik commented on MAPREDUCE-1672:
---

Looks good to me except that there's already a file 
{{src/test/org/apache/hadoop/mapred/UtilsForTests.java}} so please consider 
adding your utils there rather than creating a brand new one.

Also, please ask some MR expert to look at it from MR standpoint - I'm don't 
have enough knowledge in the field. It also will help if you can run the test 
and port the results to the JIRA.

> Create test scenario for "distributed cache file behaviour, when dfs file is 
> not modified"
> --
>
> Key: MAPREDUCE-1672
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1672
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestDistributedCacheUnModifiedFile.patch, 
> TestDistributedCacheUnModifiedFile.patch
>
>
> This test scenario is for a distributed cache file behaviour
> when it is not modified before and after being
> accessed by maximum two jobs. Once a job uses a distributed cache file
> that file is stored in the mapred.local.dir. If the next job
> uses the same file, then that is not stored again.
> So, if two jobs choose the same tasktracker for their job execution
> then, the distributed cache file should not be found twice.
> This testcase should run a job with a distributed cache file. All the
> tasks' corresponding tasktracker's handle is got and checked for
> the presence of distributed cache with proper permissions in the
> proper directory. Next when job
> runs again and if any of its tasks hits the same tasktracker, which
> ran one of the task of the previous job, then that
> file should not be uploaded again and task use the old file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1671) Test scenario for "Killing Task Attempt id till job fails"

2010-04-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854220#action_12854220
 ] 

Konstantin Boudnik commented on MAPREDUCE-1671:
---

I'm no expert in MR, however... is it possible that job never starts and this
{noformat}
+LOG.info("Sleeping 5 seconds");
+Thread.sleep(5000);
+  }
+}
+  }
+} while (!runningCount);
{noformat}
will continue indefinitely?

Otherwise it looks ok to me. 

> Test scenario for "Killing Task Attempt id till job fails"
> --
>
> Key: MAPREDUCE-1671
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1671
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestTaskKilling.patch, TestTaskKilling.patch
>
>
> 1) In a  job, kill the task attemptid of one task.  Whenever that task  tries 
> to run again with another task atempt id for mapred.map.max.attempts times, 
> kill that task attempt id. After the mapred.map.max.attempts times, the whole 
> job should get killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

2010-04-06 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1073:
-

Status: Patch Available  (was: Open)

> Progress reported for pipes tasks is incorrect.
> ---
>
> Key: MAPREDUCE-1073
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: pipes
>Affects Versions: 0.20.1
>Reporter: Sreekanth Ramakrishnan
> Attachments: mapreduce-1073--2010-03-31.patch, 
> mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, 
> {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader, 
> OutputCollector, Reporter)}} we do the following:
> {code}
> while (input.next(key, value)) {
>   downlink.mapItem(key, value);
>   if(skipping) {
> downlink.flush();
>   }
> }
> {code}
> This would result in consumption of all the records for current task and 
> taking task progress to 100% whereas the actual pipes application would be 
> trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

2010-04-06 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1073:
-

Attachment: mapreduce-1073--2010-04-06.patch

This patch is as large as it is because it includes the removal of 
{{src/examples/pipes/aclocal.m4}} .  That file is a derived file that should 
not be included in the code base.

> Progress reported for pipes tasks is incorrect.
> ---
>
> Key: MAPREDUCE-1073
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: pipes
>Affects Versions: 0.20.1
>Reporter: Sreekanth Ramakrishnan
> Attachments: mapreduce-1073--2010-03-31.patch, 
> mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, 
> {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader, 
> OutputCollector, Reporter)}} we do the following:
> {code}
> while (input.next(key, value)) {
>   downlink.mapItem(key, value);
>   if(skipping) {
> downlink.flush();
>   }
> }
> {code}
> This would result in consumption of all the records for current task and 
> taking task progress to 100% whereas the actual pipes application would be 
> trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

2010-04-06 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1073:
-

Status: Open  (was: Patch Available)

Removed this patch to replace it with another patch that tests its 
functionality.

> Progress reported for pipes tasks is incorrect.
> ---
>
> Key: MAPREDUCE-1073
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: pipes
>Affects Versions: 0.20.1
>Reporter: Sreekanth Ramakrishnan
> Attachments: mapreduce-1073--2010-03-31.patch, 
> MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, 
> {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader, 
> OutputCollector, Reporter)}} we do the following:
> {code}
> while (input.next(key, value)) {
>   downlink.mapItem(key, value);
>   if(skipping) {
> downlink.flush();
>   }
> }
> {code}
> This would result in consumption of all the records for current task and 
> taking task progress to 100% whereas the actual pipes application would be 
> trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-469) Support concatenated gzip and bzip2 files

2010-04-06 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854201#action_12854201
 ] 

David Ciemiewicz commented on MAPREDUCE-469:


bzip2 compression format also supports concatenation of individual bzip2 
compressed files into a single file.

bzcat has absolutely no problem reading all of the data in one of these 
concatenated files.

Unfortunately, both Hadoop Streaming and Pig only see about 2% of the data from 
the original file in my case.  That's a 98% effective data loss.



> Support concatenated gzip and bzip2 files
> -
>
> Key: MAPREDUCE-469
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-469
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tom White
>Assignee: Ravi Gummadi
>
> When running MapReduce with concatenated gzip files as input only the first 
> part is read, which is confusing, to say the least. Concatenated gzip is 
> described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage 
> and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at 
> http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-469) Support concatenated gzip and bzip2 files

2010-04-06 Thread David Ciemiewicz (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Ciemiewicz updated MAPREDUCE-469:
---

Summary: Support concatenated gzip and bzip2 files  (was: Support 
concatenated gzip files)

> Support concatenated gzip and bzip2 files
> -
>
> Key: MAPREDUCE-469
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-469
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tom White
>Assignee: Ravi Gummadi
>
> When running MapReduce with concatenated gzip files as input only the first 
> part is read, which is confusing, to say the least. Concatenated gzip is 
> described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage 
> and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at 
> http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1585) Create Hadoop Archives version 2 with filenames URL-encoded

2010-04-06 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854153#action_12854153
 ] 

Rodrigo Schmidt commented on MAPREDUCE-1585:


Great! Thanks, Mahadev!

> Create Hadoop Archives version 2 with filenames URL-encoded
> ---
>
> Key: MAPREDUCE-1585
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1585
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: harchive
>Reporter: Rodrigo Schmidt
>Assignee: Rodrigo Schmidt
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1585.1.patch, MAPREDUCE-1585.2.patch, 
> MAPREDUCE-1585.patch
>
>
> Hadoop Archives version 1 don't cope with files that have spaces on their 
> names.
> One proposal is to URLEncode filenames inside the index file (version 2, 
> refers to HADOOP-6591).
> This task is to allow the creation of version 2 files that have file names 
> encoded appropriately. It currently depends on HADOOP-6591

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1585) Create Hadoop Archives version 2 with filenames URL-encoded

2010-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854140#action_12854140
 ] 

Hudson commented on MAPREDUCE-1585:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #301 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/301/])
. Create Hadoop Archives version 2 with filenames URL-encoded (rodrigo via 
mahadev)


> Create Hadoop Archives version 2 with filenames URL-encoded
> ---
>
> Key: MAPREDUCE-1585
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1585
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: harchive
>Reporter: Rodrigo Schmidt
>Assignee: Rodrigo Schmidt
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1585.1.patch, MAPREDUCE-1585.2.patch, 
> MAPREDUCE-1585.patch
>
>
> Hadoop Archives version 1 don't cope with files that have spaces on their 
> names.
> One proposal is to URLEncode filenames inside the index file (version 2, 
> refers to HADOOP-6591).
> This task is to allow the creation of version 2 files that have file names 
> encoded appropriately. It currently depends on HADOOP-6591

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1585) Create Hadoop Archives version 2 with filenames URL-encoded

2010-04-06 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1585:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this. thanks rodrigo!



> Create Hadoop Archives version 2 with filenames URL-encoded
> ---
>
> Key: MAPREDUCE-1585
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1585
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: harchive
>Reporter: Rodrigo Schmidt
>Assignee: Rodrigo Schmidt
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1585.1.patch, MAPREDUCE-1585.2.patch, 
> MAPREDUCE-1585.patch
>
>
> Hadoop Archives version 1 don't cope with files that have spaces on their 
> names.
> One proposal is to URLEncode filenames inside the index file (version 2, 
> refers to HADOOP-6591).
> This task is to allow the creation of version 2 files that have file names 
> encoded appropriately. It currently depends on HADOOP-6591

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1650) Exclude Private elements from generated MapReduce Javadoc

2010-04-06 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1650:
-

Attachment: MAPREDUCE-1650.patch

New patch that works with the changes in HADOOP-6658.

> Exclude Private elements from generated MapReduce Javadoc
> -
>
> Key: MAPREDUCE-1650
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1650
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
> Attachments: MAPREDUCE-1650.patch, MAPREDUCE-1650.patch
>
>
> Exclude elements annotated with InterfaceAudience.Private or 
> InterfaceAudience.LimitedPrivate from Javadoc and JDiff.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1585) Create Hadoop Archives version 2 with filenames URL-encoded

2010-04-06 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854118#action_12854118
 ] 

Mahadev konar commented on MAPREDUCE-1585:
--

looks like hudson finally +1 ed it... ill go ahead and commit it... 

> Create Hadoop Archives version 2 with filenames URL-encoded
> ---
>
> Key: MAPREDUCE-1585
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1585
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: harchive
>Reporter: Rodrigo Schmidt
>Assignee: Rodrigo Schmidt
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1585.1.patch, MAPREDUCE-1585.2.patch, 
> MAPREDUCE-1585.patch
>
>
> Hadoop Archives version 1 don't cope with files that have spaces on their 
> names.
> One proposal is to URLEncode filenames inside the index file (version 2, 
> refers to HADOOP-6591).
> This task is to allow the creation of version 2 files that have file names 
> encoded appropriately. It currently depends on HADOOP-6591

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1676) Create test scenario for "distributed cache file behaviour, when dfs file is modified"

2010-04-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854089#action_12854089
 ] 

Konstantin Boudnik commented on MAPREDUCE-1676:
---

Looks good, however most of [these 
comments|https://issues.apache.org/jira/browse/MAPREDUCE-1677?focusedCommentId=12854087&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12854087]
 apply here.



> Create test scenario for "distributed cache file behaviour, when dfs file is 
> modified"
> --
>
> Key: MAPREDUCE-1676
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1676
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestDistributedCacheModifiedFile.patch
>
>
>  Verify the Distributed Cache functionality. This test scenario is for a 
> distributed cache file behaviour when it is modified before and after being 
> accessed by maximum two jobs. Once a job uses a distributed cache file  that 
> file is stored in the mapred.local.dir. If the next job
>  uses the same file, but with differnt timestamp, then that  file is stored 
> again. So, if two jobs choose the same tasktracker for their job execution 
> then, the distributed cache file should be found twice.
> This testcase runs a job with a distributed cache file. All the tasks' 
> corresponding tasktracker's handle is got and checked for the presence of 
> distributed cache with proper permissions in the proper directory. Next when 
> job runs again and if any of its tasks hits the same tasktracker, which ran 
> one of the task of the previous job, then that
> file should be uploaded again and task should not use the old file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1677) Test scenario for a distributed cache file behaviour when the file is private

2010-04-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854087#action_12854087
 ] 

Konstantin Boudnik commented on MAPREDUCE-1677:
---

Looks pretty good. A couple of nits:
- public methods are suppose to have JavaDoc. Tests shouldn't outcast from this 
rule.
- no all assert message have meaningful message.
- {{boolean b}} can be declared before the outermost loop. The name {{b}} is 
obscure. Should be some like {{distCacheIsFound}}
- it is better use a system property for file.delimiter instead of {{"/"}} 
- lines split is non-unified across the test. I.e.
{noformat}
+  Path pathMapredLocalDirUserName = fileStatusMapredLocalDirUserName.
+  getPath();
+  FsPermission fsPermMapredLocalDirUserName =
+  fileStatusMapredLocalDirUserName.getPermission();
{noformat}

Please address these and the patch is ready for commit, IMO.

> Test scenario for a distributed cache file behaviour  when the file is private
> --
>
> Key: MAPREDUCE-1677
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1677
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestDistributedCachePrivateFile.patch
>
>
>  Verify the Distributed Cache functionality.
>  This test scenario is for a distributed cache file behaviour  when the file 
> is private. Once a job uses a distributed 
> cache file with private permissions that file is stored in the  
> mapred.local.dir, under the directory which has the same name 
>  as job submitter's username. The directory has 700 permission  and the file 
> under it, should have 777 permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1656) JobStory should provide queue info.

2010-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854061#action_12854061
 ] 

Hudson commented on MAPREDUCE-1656:
---

Integrated in Hadoop-Mapreduce-trunk #278 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/278/])
. JobStory should provide queue info. (hong via mahadev)


> JobStory should provide queue info.
> ---
>
> Key: MAPREDUCE-1656
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1656
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Hong Tang
>Assignee: Hong Tang
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: mr-1656-2.patch, mr-1656.patch
>
>
> Add a method in JobStory to get the queue to which a job is submitted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1428) Make block size and the size of archive created files configurable.

2010-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854060#action_12854060
 ] 

Hudson commented on MAPREDUCE-1428:
---

Integrated in Hadoop-Mapreduce-trunk #278 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/278/])
.  Make block size and the size of archive created files configurable.  
Contributed by mahadev


> Make block size and the size of archive created files configurable.
> ---
>
> Key: MAPREDUCE-1428
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1428
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: harchive
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 0.22.0
>
> Attachments: BinaryFileGenerator.java, BinaryFileGenerator.java, 
> BinaryFileGenerator.java, MAPREDUCE-1428.patch, MAPREDUCE-1428.patch
>
>
> Currently the block size used by archives is the default block size of the 
> hdfs filesystem. We need to make it configurable so that the block size can 
> be higher for the part files that archives create.
> Also, we need to make the size of part files in archives configurable again 
> to make it bigger in size and create less number of such files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1648) Use RollingFileAppender to limit tasklogs

2010-04-06 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854051#action_12854051
 ] 

Koji Noguchi commented on MAPREDUCE-1648:
-

When reviewing the patch, please test the performance and make sure we don't 
re-introduce the slowness observed at HADOOP-1553.

> Use RollingFileAppender to limit tasklogs
> -
>
> Key: MAPREDUCE-1648
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1648
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Reporter: Guilin Sun
>Priority: Minor
>
> There are at least two types of task-logs: syslog and stdlog
> Task-Jvm outputs syslog by log4j with TaskLogAppender, TaskLogAppender looks 
> just like "tail -c", it stores last N byte/line logs in memory(via queue), 
> and do real output only if all logs is commit and Appender is going to close.
> The common problem of TaskLogAppender and 'tail -c'  is keep everything in 
> memory and user can't see any log output while task is in progress.
> So I'm going to try RollingFileAppender  instead of  TaskLogAppender, use 
> MaxFileSize&MaxBackupIndex to limit log file size.
> RollingFileAppender is also suitable for stdout/stderr, just redirect 
> stdout/stderr to log4j via LoggingOutputStream, no client code have to be 
> changed, and RollingFileAppender seems better than 'tail -c' too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1677) Test scenario for a distributed cache file behaviour when the file is private

2010-04-06 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal reassigned MAPREDUCE-1677:
-

Assignee: Iyappan Srinivasan

> Test scenario for a distributed cache file behaviour  when the file is private
> --
>
> Key: MAPREDUCE-1677
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1677
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestDistributedCachePrivateFile.patch
>
>
>  Verify the Distributed Cache functionality.
>  This test scenario is for a distributed cache file behaviour  when the file 
> is private. Once a job uses a distributed 
> cache file with private permissions that file is stored in the  
> mapred.local.dir, under the directory which has the same name 
>  as job submitter's username. The directory has 700 permission  and the file 
> under it, should have 777 permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1676) Create test scenario for "distributed cache file behaviour, when dfs file is modified"

2010-04-06 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-1676:
--

Assignee: Iyappan Srinivasan

> Create test scenario for "distributed cache file behaviour, when dfs file is 
> modified"
> --
>
> Key: MAPREDUCE-1676
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1676
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestDistributedCacheModifiedFile.patch
>
>
>  Verify the Distributed Cache functionality. This test scenario is for a 
> distributed cache file behaviour when it is modified before and after being 
> accessed by maximum two jobs. Once a job uses a distributed cache file  that 
> file is stored in the mapred.local.dir. If the next job
>  uses the same file, but with differnt timestamp, then that  file is stored 
> again. So, if two jobs choose the same tasktracker for their job execution 
> then, the distributed cache file should be found twice.
> This testcase runs a job with a distributed cache file. All the tasks' 
> corresponding tasktracker's handle is got and checked for the presence of 
> distributed cache with proper permissions in the proper directory. Next when 
> job runs again and if any of its tasks hits the same tasktracker, which ran 
> one of the task of the previous job, then that
> file should be uploaded again and task should not use the old file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1654) Automate the job killing system test case.

2010-04-06 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-1654:
--

Affects Version/s: (was: 0.20.3)
   Issue Type: Test  (was: New Feature)

> Automate the job killing system test case. 
> ---
>
> Key: MAPREDUCE-1654
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1654
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
> Environment: Herriot system test case development env. 
>Reporter: Balaji Rajagopalan
>Assignee: Balaji Rajagopalan
> Attachments: patch_1654.txt, patch_1654.txt, patch_1654.txt, 
> TEST-org.apache.hadoop.mapred.TestJobKill.txt
>
>   Original Estimate: 0.27h
>  Remaining Estimate: 0.27h
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1653) Add apache header to UserNamePermission.java

2010-04-06 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-1653:
--

Affects Version/s: (was: 0.20.3)
   Issue Type: Bug  (was: New Feature)

> Add apache header to UserNamePermission.java
> 
>
> Key: MAPREDUCE-1653
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1653
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
> Environment: Herriot
>Reporter: Balaji Rajagopalan
>Assignee: Balaji Rajagopalan
>Priority: Trivial
> Attachments: patch_1653.txt
>
>   Original Estimate: 0.02h
>  Remaining Estimate: 0.02h
>
> Add the missing header to the file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1616) Automate system test case for checking the file permissions in mapred.local.dir

2010-04-06 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-1616:
--

Affects Version/s: (was: 0.20.3)
   Issue Type: Test  (was: New Feature)

> Automate system test case for checking the file permissions in 
> mapred.local.dir
> ---
>
> Key: MAPREDUCE-1616
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1616
> Project: Hadoop Map/Reduce
>  Issue Type: Test
> Environment: Herriot framework is required for running the test. 
>Reporter: Balaji Rajagopalan
>Assignee: Balaji Rajagopalan
> Attachments: patch_1616.txt, patch_1616.txt, patch_1655.txt, 
> patch_3392207_7.txt
>
>   Original Estimate: 0.27h
>  Remaining Estimate: 0.27h
>
> The file under mapred.local.dir permission must be recursively tested when 
> the task is running, for this use the controllable task, so the temporary 
> file permission can be checked. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1672) Create test scenario for "distributed cache file behaviour, when dfs file is not modified"

2010-04-06 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-1672:
--

Affects Version/s: (was: 0.22.0)
Fix Version/s: (was: 0.22.0)
   Issue Type: Test  (was: New Feature)

> Create test scenario for "distributed cache file behaviour, when dfs file is 
> not modified"
> --
>
> Key: MAPREDUCE-1672
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1672
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestDistributedCacheUnModifiedFile.patch, 
> TestDistributedCacheUnModifiedFile.patch
>
>
> This test scenario is for a distributed cache file behaviour
> when it is not modified before and after being
> accessed by maximum two jobs. Once a job uses a distributed cache file
> that file is stored in the mapred.local.dir. If the next job
> uses the same file, then that is not stored again.
> So, if two jobs choose the same tasktracker for their job execution
> then, the distributed cache file should not be found twice.
> This testcase should run a job with a distributed cache file. All the
> tasks' corresponding tasktracker's handle is got and checked for
> the presence of distributed cache with proper permissions in the
> proper directory. Next when job
> runs again and if any of its tasks hits the same tasktracker, which
> ran one of the task of the previous job, then that
> file should not be uploaded again and task use the old file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1671) Test scenario for "Killing Task Attempt id till job fails"

2010-04-06 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-1671:
--

Affects Version/s: (was: 0.22.0)
   Issue Type: Test  (was: New Feature)

> Test scenario for "Killing Task Attempt id till job fails"
> --
>
> Key: MAPREDUCE-1671
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1671
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestTaskKilling.patch, TestTaskKilling.patch
>
>
> 1) In a  job, kill the task attemptid of one task.  Whenever that task  tries 
> to run again with another task atempt id for mapred.map.max.attempts times, 
> kill that task attempt id. After the mapred.map.max.attempts times, the whole 
> job should get killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1677) Test scenario for a distributed cache file behaviour when the file is private

2010-04-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1677:
--

Attachment: TestDistributedCachePrivateFile.patch

> Test scenario for a distributed cache file behaviour  when the file is private
> --
>
> Key: MAPREDUCE-1677
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1677
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
> Attachments: TestDistributedCachePrivateFile.patch
>
>
>  Verify the Distributed Cache functionality.
>  This test scenario is for a distributed cache file behaviour  when the file 
> is private. Once a job uses a distributed 
> cache file with private permissions that file is stored in the  
> mapred.local.dir, under the directory which has the same name 
>  as job submitter's username. The directory has 700 permission  and the file 
> under it, should have 777 permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1677) Test scenario for a distributed cache file behaviour when the file is private

2010-04-06 Thread Iyappan Srinivasan (JIRA)
Test scenario for a distributed cache file behaviour  when the file is private
--

 Key: MAPREDUCE-1677
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1677
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 0.22.0
Reporter: Iyappan Srinivasan
 Attachments: TestDistributedCachePrivateFile.patch

 Verify the Distributed Cache functionality.
 This test scenario is for a distributed cache file behaviour  when the file is 
private. Once a job uses a distributed 
cache file with private permissions that file is stored in the  
mapred.local.dir, under the directory which has the same name 
 as job submitter's username. The directory has 700 permission  and the file 
under it, should have 777 permissions. 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1676) Create test scenario for "distributed cache file behaviour, when dfs file is modified"

2010-04-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1676:
--

Attachment: TestDistributedCacheModifiedFile.patch

> Create test scenario for "distributed cache file behaviour, when dfs file is 
> modified"
> --
>
> Key: MAPREDUCE-1676
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1676
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
> Attachments: TestDistributedCacheModifiedFile.patch
>
>
>  Verify the Distributed Cache functionality. This test scenario is for a 
> distributed cache file behaviour when it is modified before and after being 
> accessed by maximum two jobs. Once a job uses a distributed cache file  that 
> file is stored in the mapred.local.dir. If the next job
>  uses the same file, but with differnt timestamp, then that  file is stored 
> again. So, if two jobs choose the same tasktracker for their job execution 
> then, the distributed cache file should be found twice.
> This testcase runs a job with a distributed cache file. All the tasks' 
> corresponding tasktracker's handle is got and checked for the presence of 
> distributed cache with proper permissions in the proper directory. Next when 
> job runs again and if any of its tasks hits the same tasktracker, which ran 
> one of the task of the previous job, then that
> file should be uploaded again and task should not use the old file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1676) Create test scenario for "distributed cache file behaviour, when dfs file is modified"

2010-04-06 Thread Iyappan Srinivasan (JIRA)
Create test scenario for "distributed cache file behaviour, when dfs file is 
modified"
--

 Key: MAPREDUCE-1676
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1676
 Project: Hadoop Map/Reduce
  Issue Type: Test
Affects Versions: 0.22.0
Reporter: Iyappan Srinivasan


 Verify the Distributed Cache functionality. This test scenario is for a 
distributed cache file behaviour when it is modified before and after being 
accessed by maximum two jobs. Once a job uses a distributed cache file  that 
file is stored in the mapred.local.dir. If the next job
 uses the same file, but with differnt timestamp, then that  file is stored 
again. So, if two jobs choose the same tasktracker for their job execution 
then, the distributed cache file should be found twice.

This testcase runs a job with a distributed cache file. All the tasks' 
corresponding tasktracker's handle is got and checked for the presence of 
distributed cache with proper permissions in the proper directory. Next when 
job runs again and if any of its tasks hits the same tasktracker, which ran one 
of the task of the previous job, then that
file should be uploaded again and task should not use the old file.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1672) Create test scenario for "distributed cache file behaviour, when dfs file is not modified"

2010-04-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1672:
--

Attachment: TestDistributedCacheUnModifiedFile.patch

Fixing some typo errors.

> Create test scenario for "distributed cache file behaviour, when dfs file is 
> not modified"
> --
>
> Key: MAPREDUCE-1672
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1672
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Fix For: 0.22.0
>
> Attachments: TestDistributedCacheUnModifiedFile.patch, 
> TestDistributedCacheUnModifiedFile.patch
>
>
> This test scenario is for a distributed cache file behaviour
> when it is not modified before and after being
> accessed by maximum two jobs. Once a job uses a distributed cache file
> that file is stored in the mapred.local.dir. If the next job
> uses the same file, then that is not stored again.
> So, if two jobs choose the same tasktracker for their job execution
> then, the distributed cache file should not be found twice.
> This testcase should run a job with a distributed cache file. All the
> tasks' corresponding tasktracker's handle is got and checked for
> the presence of distributed cache with proper permissions in the
> proper directory. Next when job
> runs again and if any of its tasks hits the same tasktracker, which
> ran one of the task of the previous job, then that
> file should not be uploaded again and task use the old file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1671) Test scenario for "Killing Task Attempt id till job fails"

2010-04-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1671:
--

Attachment: TestTaskKilling.patch

Latest patch fixing some comments

> Test scenario for "Killing Task Attempt id till job fails"
> --
>
> Key: MAPREDUCE-1671
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1671
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestTaskKilling.patch, TestTaskKilling.patch
>
>
> 1) In a  job, kill the task attemptid of one task.  Whenever that task  tries 
> to run again with another task atempt id for mapred.map.max.attempts times, 
> kill that task attempt id. After the mapred.map.max.attempts times, the whole 
> job should get killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1671) Test scenario for "Killing Task Attempt id till job fails"

2010-04-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1671:
--

Attachment: (was: TestTaskKilling.patch)

> Test scenario for "Killing Task Attempt id till job fails"
> --
>
> Key: MAPREDUCE-1671
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1671
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestTaskKilling.patch, TestTaskKilling.patch
>
>
> 1) In a  job, kill the task attemptid of one task.  Whenever that task  tries 
> to run again with another task atempt id for mapred.map.max.attempts times, 
> kill that task attempt id. After the mapred.map.max.attempts times, the whole 
> job should get killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1671) Test scenario for "Killing Task Attempt id till job fails"

2010-04-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1671:
--

Description: 
1) In a  job, kill the task attemptid of one task.  Whenever that task  tries 
to run again with another task atempt id for mapred.map.max.attempts times, 
kill that task attempt id. After the mapred.map.max.attempts times, the whole 
job should get killed.


  was:
1) In a  job, kill the task attemptid of one task.  Whenever that task  tries 
to run again with another task atempt id for mapred.max.tracker.failures times, 
kill that task attempt id. After the mapred.max.tracker.failures times, the 
whole job should get killed.



> Test scenario for "Killing Task Attempt id till job fails"
> --
>
> Key: MAPREDUCE-1671
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1671
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestTaskKilling.patch, TestTaskKilling.patch
>
>
> 1) In a  job, kill the task attemptid of one task.  Whenever that task  tries 
> to run again with another task atempt id for mapred.map.max.attempts times, 
> kill that task attempt id. After the mapred.map.max.attempts times, the whole 
> job should get killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1671) Test scenario for "Killing Task Attempt id till job fails"

2010-04-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1671:
--

Attachment: TestTaskKilling.patch

Task Killing patch incorporating Ravi's comments.

> Test scenario for "Killing Task Attempt id till job fails"
> --
>
> Key: MAPREDUCE-1671
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1671
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestTaskKilling.patch, TestTaskKilling.patch
>
>
> 1) In a  job, kill the task attemptid of one task.  Whenever that task  tries 
> to run again with another task atempt id for mapred.max.tracker.failures 
> times, kill that task attempt id. After the mapred.max.tracker.failures 
> times, the whole job should get killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1673) Start and Stop scripts for the RaidNode

2010-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853844#action_12853844
 ] 

Hadoop QA commented on MAPREDUCE-1673:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12440831/MAPREDUCE-1673.1.patch
  against trunk revision 930423.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/95/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/95/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/95/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/95/console

This message is automatically generated.

> Start and Stop scripts for the RaidNode
> ---
>
> Key: MAPREDUCE-1673
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1673
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/raid
>Affects Versions: 0.22.0
>Reporter: Rodrigo Schmidt
>Assignee: Rodrigo Schmidt
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1673.1.patch, MAPREDUCE-1673.patch
>
>
> We should have scripts that start and stop the RaidNode automatically. 
> Something like start-raidnode.sh and stop-raidnode.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1418) LinuxTaskController binary misses validation of arguments passed for relative components in some cases.

2010-04-06 Thread Hemanth Yamijala (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-1418:


Attachment: MAPREDUCE-1418-y20s.patch

The attached patch [MAPREDUCE-1418-y20s.patch] is a similar patch to the one 
against trunk for earlier versions of Hadoop. In this patch, I've addressed the 
problem related to validating relative paths in job log directory 
initialization, as well as Amareshwari's comment about return code in 
prepare_task_logs. The trunk patch will have similar changes. All 
LinuxTaskController tests pass with this patch. I would appreciate a review.

This patch is not for commit to Apache SVN.

> LinuxTaskController binary misses validation of arguments passed for relative 
> components in some cases.
> ---
>
> Key: MAPREDUCE-1418
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1418
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security, tasktracker
>Reporter: Vinod K V
>Assignee: Hemanth Yamijala
> Attachments: MAPREDUCE-1418-y20s.patch, MAPREDUCE-1418.patch
>
>
> The function {{int check_path_for_relative_components(char * path)}} should 
> be used to validate the absence of relative components before any operation 
> is done on those paths. This is missed in all the {{initialize*()}} 
> functions, as Hemanth pointed out offline.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1418) LinuxTaskController binary misses validation of arguments passed for relative components in some cases.

2010-04-06 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853840#action_12853840
 ] 

Hemanth Yamijala commented on MAPREDUCE-1418:
-

bq. I don't see any special validation done for the method kill_user_task() in 
task-controller.

This is a valid concern. We have checks in place to prevent this from 
happening, but those same checks actually protect against the relative paths as 
well. Hence, in a sense this JIRA will be moot given the same assumptions.

Since this JIRA is specifically focused on protecting against relative path 
usage, I propose we stick to the course taken by the patch, and fix the kill 
issue in a follow-up. Thoughts ?

> LinuxTaskController binary misses validation of arguments passed for relative 
> components in some cases.
> ---
>
> Key: MAPREDUCE-1418
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1418
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security, tasktracker
>Reporter: Vinod K V
>Assignee: Hemanth Yamijala
> Attachments: MAPREDUCE-1418.patch
>
>
> The function {{int check_path_for_relative_components(char * path)}} should 
> be used to validate the absence of relative components before any operation 
> is done on those paths. This is missed in all the {{initialize*()}} 
> functions, as Hemanth pointed out offline.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1533) reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects

2010-04-06 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853832#action_12853832
 ] 

Hemanth Yamijala commented on MAPREDUCE-1533:
-

bq. As of now its a push model where every change in the scheduler's state 
results into an info string which gets pushed to all the jobs. Shouldn't it be 
a pull model wherein the jobs pull the data from the scheduler whenever 
required?

Yes, this seems a sensible direction to me. Given that the view requests come 
very infrequently as compared to heartbeats, this is a more optimal design.

bq. But for now we can keep it simple and solve the problem at hand by using 
StringBuilder. Thoughts?

I am fine with this proposal.

> reduce or remove usage of String.format() usage in 
> CapacityTaskScheduler.updateQSIObjects
> -
>
> Key: MAPREDUCE-1533
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1
>Reporter: Rajesh Balamohan
>Assignee: Amar Kamat
> Attachments: mapreduce-1533-v1.4.patch
>
>
> When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT 
> executes heartBeat() method heavily. This internally makes a call to 
> CapacityTaskScheduler.updateQSIObjects(). 
> CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() 
> for setting the job scheduling information. Based on the datastructure size 
> of "jobQueuesManager" and "queueInfoMap", the number of times String.format() 
> gets executed becomes very high. String.format() internally does pattern 
> matching which turns to be out very heavy (This was revealed while profiling 
> JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of 
> which String.format() took 46%.
> Would it be possible to do String.format() only at the time of invoking 
> JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while 
> processing heartbeats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1533) reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects

2010-04-06 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853804#action_12853804
 ] 

Amar Kamat commented on MAPREDUCE-1533:
---

Benchmark results comparing StringBuilder with String.format :
1) StringBuilder took 1.261 secs for generating 1,000,000 strings 
2) String.format took 9.126 sec for generating 1,000,000 strings

So assuming that there are 400 heartbeat calls made per sec, we have ~2.5 ms 
per heartbeat time. Assuming that there are not more than 100 jobs running at a 
given time, we have 
1) StringBuilder taking 0.1261 ms for generating 100 strings 
2) String.format taking 0.9126 ms for generating 100 strings

Thus String.format takes 36% (i.e 0.9126/2.5) whereas StringBuilder takes 5% 
(i.e 0.1261/2.5) of the total heartbeat processing time. 

> reduce or remove usage of String.format() usage in 
> CapacityTaskScheduler.updateQSIObjects
> -
>
> Key: MAPREDUCE-1533
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1
>Reporter: Rajesh Balamohan
>Assignee: Amar Kamat
> Attachments: mapreduce-1533-v1.4.patch
>
>
> When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT 
> executes heartBeat() method heavily. This internally makes a call to 
> CapacityTaskScheduler.updateQSIObjects(). 
> CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() 
> for setting the job scheduling information. Based on the datastructure size 
> of "jobQueuesManager" and "queueInfoMap", the number of times String.format() 
> gets executed becomes very high. String.format() internally does pattern 
> matching which turns to be out very heavy (This was revealed while profiling 
> JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of 
> which String.format() took 46%.
> Would it be possible to do String.format() only at the time of invoking 
> JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while 
> processing heartbeats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1466) FileInputFormat should save #input-files in JobConf

2010-04-06 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853790#action_12853790
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1466:


bq. I don't think the test in 
org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat needs to use Mockitto 
framework. Why don't we add a test similar to the one in old api?
I thought the test need not use Mockitto, because it is not easy to understand. 
But, it is fine with me.


> FileInputFormat should save #input-files in JobConf
> ---
>
> Key: MAPREDUCE-1466
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1466
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.22.0
>Reporter: Arun C Murthy
>Assignee: Luke Lu
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1466_yhadoop20-1.patch, 
> MAPREDUCE-1466_yhadoop20-2.patch, MAPREDUCE-1466_yhadoop20-3.patch, 
> MAPREDUCE-1466_yhadoop20.patch, mr-1466-trunk-v1.patch, 
> mr-1466-trunk-v2.patch, mr-1466-trunk-v3.patch, mr-1466-trunk-v4.patch, 
> mr-1466-trunk-v5.patch
>
>
> We already track the amount of data consumed by MR applications 
> (MAP_INPUT_BYTES), alongwith, it would be useful to #input-files from the 
> client-side for analysis. Along the lines of MAPREDUCE-1403, it would be easy 
> to stick in the JobConf during job-submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1533) reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects

2010-04-06 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853785#action_12853785
 ] 

Amar Kamat commented on MAPREDUCE-1533:
---

How about using _StringBuilder_ instead of _String.format_? The problem lies in 
the way how scheduling info is managed. As of now its a push model where every 
change in the scheduler's state results into an info string which gets  pushed 
to all the jobs. Shouldn't it be a pull model wherein the jobs pull the data 
from the scheduler whenever required? Roughly ~100 hearbeat calls are made in a 
sec and in every hearbeat, the scheduler's state can potentially change 
resulting into an info string being pushed. That is, most of the times the info 
gets over-written before getting consumed making the pull model a good fit for 
this case. But for now we can keep it simple and solve the problem at hand by 
using StringBuilder. Thoughts?

> reduce or remove usage of String.format() usage in 
> CapacityTaskScheduler.updateQSIObjects
> -
>
> Key: MAPREDUCE-1533
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1
>Reporter: Rajesh Balamohan
>Assignee: Amar Kamat
> Attachments: mapreduce-1533-v1.4.patch
>
>
> When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT 
> executes heartBeat() method heavily. This internally makes a call to 
> CapacityTaskScheduler.updateQSIObjects(). 
> CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() 
> for setting the job scheduling information. Based on the datastructure size 
> of "jobQueuesManager" and "queueInfoMap", the number of times String.format() 
> gets executed becomes very high. String.format() internally does pattern 
> matching which turns to be out very heavy (This was revealed while profiling 
> JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of 
> which String.format() took 46%.
> Would it be possible to do String.format() only at the time of invoking 
> JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while 
> processing heartbeats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1523) Sometimes rumen trace generator fails to extract the job finish time.

2010-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853786#action_12853786
 ] 

Hadoop QA commented on MAPREDUCE-1523:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12440421/mapreduce-1523--2010-03-31a-1612PDT.patch
  against trunk revision 930423.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 13 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/94/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/94/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/94/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/94/console

This message is automatically generated.

> Sometimes rumen trace generator fails to extract the job finish time.
> -
>
> Key: MAPREDUCE-1523
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1523
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Hong Tang
>Assignee: Dick King
> Attachments: mapreduce-1523--2010-03-31a-1612PDT.patch
>
>
> We saw sometimes (not very often) that rumen may fail to extract the job 
> finish time from Hadoop 0.20 history log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.