[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-06-30 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884190#action_12884190
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1901:
--

certainly true that we could do this at the Hive layer. two issues:

- not generic (meaning wouldn't work for streaming for example)
- need to repeat some of the classpath management stuff that Jobclient/TT 
already take care.

currently Hive leverages Hadoop provided facilities for distributing jars and 
files - and we will try to extend this functionality.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1905) Context.setStatus() and progress() api are ignored

2010-06-30 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1905:
---

Attachment: patch-1905.txt

Patch with the fix. Also includes a regression test.

> Context.setStatus() and progress() api are ignored
> --
>
> Key: MAPREDUCE-1905
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1905
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: patch-1905.txt
>
>
> TaskAttemptContext.setStatus() and progress() were overriden in 
> TaskInputOutputContext, inbranch 0.20, to call the underlying reporter apis. 
> But the methods are no more over-riden in TaskInputOutputContextImpl after 
> MAPREDUCE-954.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1905) Context.setStatus() and progress() api are ignored

2010-06-30 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1905:
---

  Status: Patch Available  (was: Open)
Assignee: Amareshwari Sriramadasu

> Context.setStatus() and progress() api are ignored
> --
>
> Key: MAPREDUCE-1905
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1905
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: patch-1905.txt
>
>
> TaskAttemptContext.setStatus() and progress() were overriden in 
> TaskInputOutputContext, inbranch 0.20, to call the underlying reporter apis. 
> But the methods are no more over-riden in TaskInputOutputContextImpl after 
> MAPREDUCE-954.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1905) Context.setStatus() and progress() api are ignored

2010-06-30 Thread Amareshwari Sriramadasu (JIRA)
Context.setStatus() and progress() api are ignored
--

 Key: MAPREDUCE-1905
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1905
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0


TaskAttemptContext.setStatus() and progress() were overriden in 
TaskInputOutputContext, inbranch 0.20, to call the underlying reporter apis. 
But the methods are no more over-riden in TaskInputOutputContextImpl after 
MAPREDUCE-954.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1870) Harmonize MapReduce JAR library versions with Common and HDFS

2010-06-30 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1870:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

I've just committed this.

> Harmonize MapReduce JAR library versions with Common and HDFS
> -
>
> Key: MAPREDUCE-1870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1870
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1870.patch, MAPREDUCE-1870.patch, 
> MAPREDUCE-1870.patch
>
>
> MapReduce part of HADOOP-6800.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-06-30 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884162#action_12884162
 ] 

dhruba borthakur commented on MAPREDUCE-1901:
-

+1

from what I have learnt, the files in the distributed cache are persisted even 
across map-reduce jobs. So, the hive client can upload the relavant jars to 
some location in hdfs and then point the distributed cache to that hdfs uri(s). 
If we do that, then the TT will download those hdfs uri(s) to the local disk 
only once and all tasks (across multiple jobs) on that task tracker will 
continue to use these jars.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1894) DistributedRaidFileSystem.readFully() does not return

2010-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884159#action_12884159
 ] 

Hadoop QA commented on MAPREDUCE-1894:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448454/MAPREDUCE-1894.patch
  against trunk revision 959221.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/276/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/276/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/276/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/276/console

This message is automatically generated.

> DistributedRaidFileSystem.readFully() does not return
> -
>
> Key: MAPREDUCE-1894
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1894
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Reporter: Ramkumar Vadali
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1894.patch
>
>
> DistributedRaidFileSystem.readFully() has a while(true) loop with no return. 
> The read(*) functions do not have this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1870) Harmonize MapReduce JAR library versions with Common and HDFS

2010-06-30 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884150#action_12884150
 ] 

Hemanth Yamijala commented on MAPREDUCE-1870:
-

Thanks, Amareshwari. So this is MAPREDUCE-1834.

> Harmonize MapReduce JAR library versions with Common and HDFS
> -
>
> Key: MAPREDUCE-1870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1870
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1870.patch, MAPREDUCE-1870.patch, 
> MAPREDUCE-1870.patch
>
>
> MapReduce part of HADOOP-6800.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-06-30 Thread Rajesh Balamohan (JIRA)
Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
---

 Key: MAPREDUCE-1904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan


While profiling tasktracker with Sort benchmark, it was observed that threads 
block on LocalDirAllocator.getLocalPathToRead() in order to get the index file 
and temporary map output file.

As LocalDirAllocator is tied up with ServetContext,  only one instance would be 
available per tasktracker httpserver.  Given the jobid & mapid, 
LocalDirAllocator retrieves index file path and temporary map output file path. 
getLocalPathToRead() is internally synchronized.

Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache 
with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be 
varied based on the environment and I observed a throughput improvement in the 
order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-978) -file option in streaming does not preserve execute permissions

2010-06-30 Thread Greg Roelofs (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884146#action_12884146
 ] 

Greg Roelofs commented on MAPREDUCE-978:


bq. Files passed using -file option are packaged into a jar and unjarred on the 
computing node. So, it wont preserve the execute permissions on the file.

There's no obvious reason it shouldn't.  Jar files are zipfiles, and at least 
the ant-built ones are created using Info-ZIP's zip (or something based on it), 
which preserves Unix perms for every file.  For example, here's one of the 
directories in my hadoop-common/build/hadoop-common-0.22.0-SNAPSHOT.jar:

{noformat} 
Central directory entry #5:
---

  org/apache/hadoop/

  offset of local header from start of archive: 336 (0150h) bytes
  file system or operating system of origin:Unix
  version of encoding software: 2.0
  minimum file system compatibility required:   MS-DOS, OS/2 or NT FAT
  minimum software version required to extract: 1.0
  compression method:   none (stored)
  file security status: not encrypted
  extended local header:no
  file last modified on (DOS date/time):2010 Jun 28 17:29:48
  32-bit CRC value (hex):   
  compressed size:  0 bytes
  uncompressed size:0 bytes
  length of filename:   18 characters
  length of extra field:0 bytes
  length of file comment:   0 characters
  disk number on which file begins: disk 1
  apparent file type:   binary
  Unix file attributes (040755 octal):  drwxr-xr-x
  MS-DOS file attributes (10 hex):  dir 
{noformat}

If the corresponding Unix unzip were used to do the unpacking, the Unix perms 
would be restored correctly.  (UID/GID could be stored and recovered, as well, 
but that requires an extra option on each end and a privileged [root] user to 
do the unpacking.)

> -file option in streaming does not preserve execute permissions
> ---
>
> Key: MAPREDUCE-978
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-978
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Reporter: Chris Dyer
>
> For a streaming application I used the -file option to move some executable 
> files to the slave nodes.  On the submit node, they had +x permissions but on 
> the destination node they were created with -x permissions.  This probably 
> has to do with the umask settings on the various nodes, but streaming should 
> preserve the original permissions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1870) Harmonize MapReduce JAR library versions with Common and HDFS

2010-06-30 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884144#action_12884144
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1870:


Hemanth, I could open the test report. May be link was slow when you tried. 
Here is the report:
{noformat}
Test Result
1 failures (±0)
913 tests (±0)
Took 3 hr 37 min.
add description
All Failed Tests
Test Name   DurationAge   
>>><<< org.apache.hadoop.mapred.TestSimulatorDeterministicReplay.testMain   
>>>0.0020  35
{noformat}

> Harmonize MapReduce JAR library versions with Common and HDFS
> -
>
> Key: MAPREDUCE-1870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1870
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1870.patch, MAPREDUCE-1870.patch, 
> MAPREDUCE-1870.patch
>
>
> MapReduce part of HADOOP-6800.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1893) Multiple reducers for Slive

2010-06-30 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884130#action_12884130
 ] 

Konstantin Shvachko commented on MAPREDUCE-1893:


Ravi, {{SliveTest.class.getSimpleName()}} returns string {{"SliveTest"}}. 
Should work fine as is. Could you pls verify again.

> Multiple reducers for Slive
> ---
>
> Key: MAPREDUCE-1893
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1893
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: benchmarks, test
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.22.0
>
> Attachments: SliveMultiR.patch, SliveMultiR.patch, SliveMultiR.patch
>
>
> Slive currently uses single reducer. It could use multiple ones.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1870) Harmonize MapReduce JAR library versions with Common and HDFS

2010-06-30 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884128#action_12884128
 ] 

Hemanth Yamijala commented on MAPREDUCE-1870:
-

I am unable to get the Hudson test report link to show up. Anyone knows how to 
fix it ?

> Harmonize MapReduce JAR library versions with Common and HDFS
> -
>
> Key: MAPREDUCE-1870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1870
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1870.patch, MAPREDUCE-1870.patch, 
> MAPREDUCE-1870.patch
>
>
> MapReduce part of HADOOP-6800.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1856) Extract a subset of tests for smoke (DOA) validation

2010-06-30 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1856:
--

Status: Patch Available  (was: Open)

> Extract a subset of tests for smoke (DOA) validation
> 
>
> Key: MAPREDUCE-1856
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1856
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: MAPREDUCE-1856.patch, MAPREDUCE-1856.patch
>
>
> Similar to that of HDFS-1199 for MapReduce.
> Adds an ability to run up to 30 minutes of the tests to 'smoke' MapReduce 
> build i.e. find possible issues faster than the full test cycle does).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1856) Extract a subset of tests for smoke (DOA) validation

2010-06-30 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1856:
--

Attachment: MAPREDUCE-1856.patch

Refitting the patch for trunk.

> Extract a subset of tests for smoke (DOA) validation
> 
>
> Key: MAPREDUCE-1856
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1856
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: MAPREDUCE-1856.patch, MAPREDUCE-1856.patch
>
>
> Similar to that of HDFS-1199 for MapReduce.
> Adds an ability to run up to 30 minutes of the tests to 'smoke' MapReduce 
> build i.e. find possible issues faster than the full test cycle does).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1903) Allow different slowTaskThreshold for mappers and reducers

2010-06-30 Thread Scott Chen (JIRA)
Allow different slowTaskThreshold for mappers and reducers
--

 Key: MAPREDUCE-1903
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1903
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0


We have been running the new speculative logic in HADOOP-2141 done by Andy in 
our production cluster.
One thing that we observed is that there are significantly low number of 
speculative reducers.
We have seen usually 100 speculative mappers launched per minute but 
speculative reducers are usually less than 5.
But reducers are usually where we get complains about having speculative issues.

These two types of tasks has different properties and different needs.
It would be nice if we can configure the slow threshold separately to deal with 
them separately.

We can add the following config keys to allow setting them independently.
MRJobConfig.SPECULATIVE_MAP_SLOWTASK_THRESHOLD
MRJobConfig.SPECULATIVE_REDUCE_SLOWTASK_THRESHOLD

Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1845) FairScheduler.tasksToPeempt() can return negative number

2010-06-30 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884108#action_12884108
 ] 

Scott Chen commented on MAPREDUCE-1845:
---

@Matei, The patch is ready. Could you help me commit it? Thanks.

> FairScheduler.tasksToPeempt() can return negative number
> 
>
> Key: MAPREDUCE-1845
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1845
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/fair-share
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1845-v2.txt, MAPREDUCE-1845.20100717.txt
>
>
> This method can return negative number. This will cause the preemption to 
> under-preempt.
> The bug was discovered by Joydeep.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1863) [Rumen] Null failedMapAttemptCDFs in job traces generated by Rumen

2010-06-30 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1863:
-

Attachment: mr-1863-yhadoop-20.1xx.patch

patch for yahoop hadoop 20.1xx branch. Not to be committed.

> [Rumen] Null failedMapAttemptCDFs in job traces generated by Rumen
> --
>
> Key: MAPREDUCE-1863
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1863
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Fix For: 0.22.0
>
> Attachments: counters-test-trace.json.gz, 
> dispatch-trace-output.json.gz, mr-1863-yhadoop-20.1xx.patch, 
> rumen-npe-v1.1-bin.patch, rumen-npe-v1.1.patch
>
>
> All the traces generated by Rumen for jobs having failed task attempts has 
> null value for failedMapAttemptCDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run

2010-06-30 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-1783:
---

Status: Open  (was: Patch Available)

> Task Initialization should be delayed till when a job can be run
> 
>
> Key: MAPREDUCE-1783
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/fair-share
>Affects Versions: 0.20.1
>Reporter: Ramkumar Vadali
> Fix For: 0.22.0
>
> Attachments: 0001-Pool-aware-job-initialization.patch, 
> 0001-Pool-aware-job-initialization.patch.1, submit-mapreduce-1783.patch
>
>
> The FairScheduler task scheduler uses PoolManager to impose limits on the 
> number of jobs that can be running at a given time. However, jobs that are 
> submitted are initiaiized immediately by EagerTaskInitializationListener by 
> calling JobInProgress.initTasks. This causes the job split file to be read 
> into memory. The split information is not needed until the number of running 
> jobs is less than the maximum specified. If the amount of split information 
> is large, this leads to unnecessary memory pressure on the Job Tracker.
> To ease memory pressure, FairScheduler can use another implementation of 
> JobInProgressListener that is aware of PoolManager limits and can delay task 
> initialization until the number of running jobs is below the maximum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run

2010-06-30 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-1783:
---

Status: Patch Available  (was: Open)

Trying again

> Task Initialization should be delayed till when a job can be run
> 
>
> Key: MAPREDUCE-1783
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/fair-share
>Affects Versions: 0.20.1
>Reporter: Ramkumar Vadali
> Fix For: 0.22.0
>
> Attachments: 0001-Pool-aware-job-initialization.patch, 
> 0001-Pool-aware-job-initialization.patch.1, submit-mapreduce-1783.patch
>
>
> The FairScheduler task scheduler uses PoolManager to impose limits on the 
> number of jobs that can be running at a given time. However, jobs that are 
> submitted are initiaiized immediately by EagerTaskInitializationListener by 
> calling JobInProgress.initTasks. This causes the job split file to be read 
> into memory. The split information is not needed until the number of running 
> jobs is less than the maximum specified. If the amount of split information 
> is large, this leads to unnecessary memory pressure on the Job Tracker.
> To ease memory pressure, FairScheduler can use another implementation of 
> JobInProgressListener that is aware of PoolManager limits and can delay task 
> initialization until the number of running jobs is below the maximum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1816) HAR files used for RAID parity need to have configurable partfile size

2010-06-30 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-1816:
---

Attachment: MAPREDUCE-1816.patch.1

Fixed unnecessary whitespace changes in README

> HAR files used for RAID parity need to have configurable partfile size
> --
>
> Key: MAPREDUCE-1816
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1816
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.20.1
>Reporter: Ramkumar Vadali
>Priority: Minor
> Attachments: MAPREDUCE-1816.patch, MAPREDUCE-1816.patch.1
>
>
> RAID parity files are merged into HAR archives periodically. This is required 
> to reduce the number of files that the NameNode has to track. The number of 
> files present in a HAR archive depends on the size of HAR part files - higher 
> the size, lower the number of files.
> The size of HAR part files is configurable through the setting 
> har.partfile.size, but that is a global setting. This task introduces a new 
> setting specific to raid.har.partfile.size, that is used in-turn to set 
> har.partfile.size

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1816) HAR files used for RAID parity need to have configurable partfile size

2010-06-30 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-1816:
---

   Status: Patch Available  (was: Open)
Fix Version/s: 0.22.0

> HAR files used for RAID parity need to have configurable partfile size
> --
>
> Key: MAPREDUCE-1816
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1816
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.20.1
>Reporter: Ramkumar Vadali
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1816.patch, MAPREDUCE-1816.patch.1
>
>
> RAID parity files are merged into HAR archives periodically. This is required 
> to reduce the number of files that the NameNode has to track. The number of 
> files present in a HAR archive depends on the size of HAR part files - higher 
> the size, lower the number of files.
> The size of HAR part files is configurable through the setting 
> har.partfile.size, but that is a global setting. This task introduces a new 
> setting specific to raid.har.partfile.size, that is used in-turn to set 
> har.partfile.size

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1816) HAR files used for RAID parity need to have configurable partfile size

2010-06-30 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-1816:
---

Status: Open  (was: Patch Available)

> HAR files used for RAID parity need to have configurable partfile size
> --
>
> Key: MAPREDUCE-1816
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1816
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.20.1
>Reporter: Ramkumar Vadali
>Priority: Minor
> Attachments: MAPREDUCE-1816.patch, MAPREDUCE-1816.patch.1
>
>
> RAID parity files are merged into HAR archives periodically. This is required 
> to reduce the number of files that the NameNode has to track. The number of 
> files present in a HAR archive depends on the size of HAR part files - higher 
> the size, lower the number of files.
> The size of HAR part files is configurable through the setting 
> har.partfile.size, but that is a global setting. This task introduces a new 
> setting specific to raid.har.partfile.size, that is used in-turn to set 
> har.partfile.size

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1838) DistRaid map tasks have large variance in running times

2010-06-30 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-1838:
---

   Status: Patch Available  (was: Open)
Fix Version/s: 0.22.0

Shuffle files to raid before submitting the raid job.

> DistRaid map tasks have large variance in running times
> ---
>
> Key: MAPREDUCE-1838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1838
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.20.1
>Reporter: Ramkumar Vadali
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1838.patch
>
>
> HDFS RAID uses map-reduce jobs to generate parity files for a set of source 
> files. Each map task gets a subset of files to operate on. The current code 
> assigns files by walking through the list of files given in the constructor 
> of DistRaid
> The problem is that the list of files given to the constructor has the order 
> of (pretty much) the directory listing. When a large number of files is 
> added, files in that order tend to have the same size. Thus a map task can 
> end up with large files where as another can end up with small files, 
> increasing the variance in run times.
> We could do smarter assignment by using the file sizes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1902) job jar file is not distributed via DistributedCache

2010-06-30 Thread Joydeep Sen Sarma (JIRA)
job jar file is not distributed via DistributedCache


 Key: MAPREDUCE-1902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1902
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Joydeep Sen Sarma


The main jar file for an job is not distributed via the distributed cache. It 
would be more efficient if that were the case.

It would also allow us to comprehensively tackle the inefficiencies in 
distribution of jar files and such (see MAPREDUCE-1901).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1699) JobHistory shouldn't be disabled for any reason

2010-06-30 Thread Krishna Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Ramachandran updated MAPREDUCE-1699:


Attachment: mapred-1699-7.patch

Revised (from Arun's review)

if logSubmitted writer fails we remove the entry from fileManager list (if not 
null)

Patch for an earlier release. Not for commit in trunk


> JobHistory shouldn't be disabled for any reason
> ---
>
> Key: MAPREDUCE-1699
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1699
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.2
>Reporter: Arun C Murthy
>Assignee: Krishna Ramachandran
> Fix For: 0.20.3
>
> Attachments: mapred-1699-1.patch, mapred-1699-2.patch, 
> mapred-1699-3.patch, mapred-1699-5.patch, mapred-1699-7.patch, 
> mapred-1699.patch
>
>
> Recently we have had issues with JobTracker silently disabling job-history 
> and starting to keep all completed jobs in memory. This leads to OOM on the 
> JobTracker. We should never do this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1838) DistRaid map tasks have large variance in running times

2010-06-30 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-1838:
---

Attachment: MAPREDUCE-1838.patch

> DistRaid map tasks have large variance in running times
> ---
>
> Key: MAPREDUCE-1838
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1838
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.20.1
>Reporter: Ramkumar Vadali
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1838.patch
>
>
> HDFS RAID uses map-reduce jobs to generate parity files for a set of source 
> files. Each map task gets a subset of files to operate on. The current code 
> assigns files by walking through the list of files given in the constructor 
> of DistRaid
> The problem is that the list of files given to the constructor has the order 
> of (pretty much) the directory listing. When a large number of files is 
> added, files in that order tend to have the same size. Thus a map task can 
> end up with large files where as another can end up with small files, 
> increasing the variance in run times.
> We could do smarter assignment by using the file sizes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1902) job jar file is not distributed via DistributedCache

2010-06-30 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884081#action_12884081
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1902:
--

Perhaps there's some history to why things are this way - if anyone knows - 
please do share.

> job jar file is not distributed via DistributedCache
> 
>
> Key: MAPREDUCE-1902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> The main jar file for an job is not distributed via the distributed cache. It 
> would be more efficient if that were the case.
> It would also allow us to comprehensively tackle the inefficiencies in 
> distribution of jar files and such (see MAPREDUCE-1901).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1894) DistributedRaidFileSystem.readFully() does not return

2010-06-30 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-1894:
---

Attachment: MAPREDUCE-1894.patch

> DistributedRaidFileSystem.readFully() does not return
> -
>
> Key: MAPREDUCE-1894
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1894
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Reporter: Ramkumar Vadali
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1894.patch
>
>
> DistributedRaidFileSystem.readFully() has a while(true) loop with no return. 
> The read(*) functions do not have this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1894) DistributedRaidFileSystem.readFully() does not return

2010-06-30 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-1894:
---

   Status: Patch Available  (was: Open)
Fix Version/s: 0.22.0

Fix along with unit-tests.

> DistributedRaidFileSystem.readFully() does not return
> -
>
> Key: MAPREDUCE-1894
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1894
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Reporter: Ramkumar Vadali
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1894.patch
>
>
> DistributedRaidFileSystem.readFully() has a while(true) loop with no return. 
> The read(*) functions do not have this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-06-30 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884056#action_12884056
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1901:
--

yeah - we have an intern (Junjie Liang) working on this and he is reusing the 
Distributed Cache - should be posting some code/design soon.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-06-30 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884049#action_12884049
 ] 

Arun C Murthy commented on MAPREDUCE-1901:
--

+1

A straight-forward way is to use the DistributedCache directly, an easy change 
is to get the JobSubmissionProtocol use either a custom jar (as today) or just 
refer to the jars in the DistributedCache.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-06-30 Thread Joydeep Sen Sarma (JIRA)
Jobs should not submit the same jar files over and over again
-

 Key: MAPREDUCE-1901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Joydeep Sen Sarma


Currently each Hadoop job uploads the required resources (jars/files/archives) 
to a new location in HDFS. Map-reduce nodes involved in executing this job 
would then download these resources into local disk.

In an environment where most of the users are using a standard set of jars and 
files (because they are using a framework like Hive/Pig) - the same jars keep 
getting uploaded and downloaded repeatedly. The overhead of this protocol 
(primarily in terms of end-user latency) is significant when:
- the jobs are small (and conversantly - large in number)
- Namenode is under load (meaning hdfs latencies are high and made worse, in 
part, by this protocol)

Hadoop should provide a way for jobs in a cooperative environment to not submit 
the same files over and again. Identifying and caching execution resources by a 
content signature (md5/sha) would be a good alternative to have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1896) [Herriot] New property for multi user list.

2010-06-30 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1896:
-

Attachment: MAPREDUCE-1896.patch

Changed the file name and attached the new patch. For HDFS, I am creating a 
separate jira ticktet.

> [Herriot] New property for multi user list.
> ---
>
> Key: MAPREDUCE-1896
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, 
> MAPREDUCE-1896.patch
>
>
> Adding new property for multi user list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.

2010-06-30 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884024#action_12884024
 ] 

Konstantin Boudnik commented on MAPREDUCE-1896:
---

+1 looks good. Please run this through usual patch validation.

> [Herriot] New property for multi user list.
> ---
>
> Key: MAPREDUCE-1896
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, 
> MAPREDUCE-1896.patch
>
>
> Adding new property for multi user list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1846) Add option to run Slive tests in test jar driver.

2010-06-30 Thread Ravi Phulari (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Phulari resolved MAPREDUCE-1846.
-

Release Note: This will be fixed along with MAPRED-1893.
  Resolution: Fixed

> Add option to run Slive tests in test jar driver. 
> --
>
> Key: MAPREDUCE-1846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1846
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Ravi Phulari
>Assignee: Ravi Phulari
> Fix For: 0.22.0
>
>
> Currently there is no way to run Slive tests through test jar. It is required 
> to add option to run slive tests from test jar driver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.

2010-06-30 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884019#action_12884019
 ] 

Konstantin Boudnik commented on MAPREDUCE-1896:
---

Sorry for not making this suggestion earlier: let's call the file proxyusers 
and then patch seems to be good to go.
HDFS version of this is needed still.

> [Herriot] New property for multi user list.
> ---
>
> Key: MAPREDUCE-1896
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch
>
>
> Adding new property for multi user list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1846) Add option to run Slive tests in test jar driver.

2010-06-30 Thread Ravi Phulari (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884017#action_12884017
 ] 

Ravi Phulari commented on MAPREDUCE-1846:
-

Slive was introduced in  HDFS-708.

> Add option to run Slive tests in test jar driver. 
> --
>
> Key: MAPREDUCE-1846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1846
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Ravi Phulari
>Assignee: Ravi Phulari
> Fix For: 0.22.0
>
>
> Currently there is no way to run Slive tests through test jar. It is required 
> to add option to run slive tests from test jar driver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1898) [Herriot] Implement a functionality for getting the job summary information of a job.

2010-06-30 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1898:
-

Attachment: MAPREDUCE-1898.patch

Patch for trunk.

> [Herriot] Implement a functionality for getting the job summary information 
> of a job.
> -
>
> Key: MAPREDUCE-1898
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1898
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1898-ydist-security.patch, 1898-ydist-security.patch, 
> 1898-ydist-security.patch, MAPREDUCE-1898.patch
>
>
> Implement a method for getting the job summary details of a job. The job 
> summary should be.
> jobId, startTime, launchTime, finishTime, numMaps, numSlotsPerMap, 
> numReduces, numSlotsPerReduce, user, queue, status, mapSlotSeconds, 
> reduceSlotSeconds, clusterMapCapacity,clusterReduceCapacity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1896) [Herriot] New property for multi user list.

2010-06-30 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1896:
-

Attachment: MAPREDUCE-1896.patch

Addressed the comments and attached the new patch.

> [Herriot] New property for multi user list.
> ---
>
> Key: MAPREDUCE-1896
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch
>
>
> Adding new property for multi user list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1898) [Herriot] Implement a functionality for getting the job summary information of a job.

2010-06-30 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1898:
-

Attachment: 1898-ydist-security.patch

Addressed the comments and uploaded the new patch.

> [Herriot] Implement a functionality for getting the job summary information 
> of a job.
> -
>
> Key: MAPREDUCE-1898
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1898
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1898-ydist-security.patch, 1898-ydist-security.patch, 
> 1898-ydist-security.patch
>
>
> Implement a method for getting the job summary details of a job. The job 
> summary should be.
> jobId, startTime, launchTime, finishTime, numMaps, numSlotsPerMap, 
> numReduces, numSlotsPerReduce, user, queue, status, mapSlotSeconds, 
> reduceSlotSeconds, clusterMapCapacity,clusterReduceCapacity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1854) [herriot] Automate health script system test

2010-06-30 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883975#action_12883975
 ] 

Konstantin Boudnik commented on MAPREDUCE-1854:
---

- Please remove 
{noformat}
+LOG.error("Exit code in shell command exe "+exitCode+" 
"+errMsg.toString());
{noformat}
from the Shell.java in the Common code.
- Also, this seems to be the patch for y20 branch. Please provide one for trunk.


> [herriot] Automate health script system test
> 
>
> Key: MAPREDUCE-1854
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
> Environment: Herriot framework
>Reporter: Balaji Rajagopalan
>Assignee: Balaji Rajagopalan
> Attachments: health_script_5.txt, health_script_7.txt
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> 1. There are three scenarios, first is induce a error from health script, 
> verify that task tracker is blacklisted. 
> 2. Make the health script timeout and verify the task tracker is blacklisted. 
> 3. Make an error in the health script path and make sure the task tracker 
> stays healthy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-577) Duplicate Mapper input when using StreamXmlRecordReader

2010-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883874#action_12883874
 ] 

Hadoop QA commented on MAPREDUCE-577:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448380/577.v3.patch
  against trunk revision 959193.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 14 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/275/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/275/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/275/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/275/console

This message is automatically generated.

> Duplicate Mapper input when using StreamXmlRecordReader
> ---
>
> Key: MAPREDUCE-577
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-577
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
> Environment: HADOOP 0.17.0, Java 6.0
>Reporter: David Campbell
>Assignee: Ravi Gummadi
> Attachments: 0001-test-to-demonstrate-HADOOP-3484.patch, 
> 0002-patch-for-HADOOP-3484.patch, 577.20S.patch, 577.patch, 577.v1.patch, 
> 577.v2.patch, 577.v3.patch, HADOOP-3484.combined.patch, HADOOP-3484.try3.patch
>
>
> I have an XML file with 93626 rows.  A row is marked by 
> I've confirmed this with grep and the Grep example program included with 
> HADOOP.
> Here is the grep example output.  93626   
> I've setup my job configuration as follows:   
> conf.set("stream.recordreader.class", 
> "org.apache.hadoop.streaming.StreamXmlRecordReader");
> conf.set("stream.recordreader.begin", "");
> conf.set("stream.recordreader.end", "");
> conf.setInputFormat(StreamInputFormat.class);
> I have a fairly simple test Mapper.
> Here's the map method.
>   public void map(Text key, Text value, OutputCollector 
> output, Reporter reporter) throws IOException {
> try {
> output.collect(totalWord, one);
> if (key != null && key.toString().indexOf("01852") != -1) {
> output.collect(new Text("01852"), one);
> }
> } catch (Exception ex) {
> Logger.getLogger(TestMapper.class.getName()).log(Level.SEVERE, 
> null, ex);
> System.out.println(value);
> }
> }
> For totalWord ("TOTAL"), I get:
> TOTAL 140850
> and for 01852 I get.
> 01852 86
> There are 43 instances of 01852 in the file.
> I have the following setting in my config.  
>conf.setNumMapTasks(1);
> I have a total of six machines in my cluster.
> If I run without this, the result is 12x the actual value, not 2x.
> Here's some info from the cluster web page.
> Maps  Reduces Total Submissions   Nodes   Map Task Capacity   Reduce 
> Task CapacityAvg. Tasks/Node
> 0 0   1   6   12  12  4.00
> I've also noticed something really strange in the job's output.  It looks 
> like it's starting over or redoing things.
> This was run using all six nodes and no limitations on map or reduce tasks.  
> I haven't seen this behavior in any other case.
> 08/06/03 10:50:35 INFO mapred.FileInputFormat: Total input paths to process : 
> 1
> 08/06/03 10:50:36 INFO mapred.JobClient: Running job: job_200806030916_0018
> 08/06/03 10:50:37 INFO mapred.JobClient:  map 0% reduce 0%
> 08/06/03 10:50:42 INFO mapred.JobClient:  map 2% reduce 0%
> 08/06/03 10:50:45 INFO mapred.JobClient:  map 12% reduce 0%
> 08/06/03 10:50:47 INFO mapred.JobClient:  map 31% reduce 0%
> 08/06/03 10:50:48 INFO mapred.JobClient:  map 49% reduce 0%
> 08/06/03 10:50:49 INFO mapred.JobClient:  map 68% reduce 0%
> 08/06/03 10:50:50 INFO mapred.JobClient:  map 100% reduce 0%
> 08/06/03 10:50:54 INFO mapred.JobClient:  map 87% reduce 0%
> 08/06/03 10:50:55 INFO mapred.JobClient:  map 100% reduce 0%
> 08/06/03 10:50:56 INFO mapred.JobClient:  map 0% reduce 0%
> 08/06/03 10:51:00 INFO mapred.JobClient:  map 0% reduce 1%
> 0

[jira] Commented: (MAPREDUCE-1888) Streaming overrides user given output key and value types.

2010-06-30 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883833#action_12883833
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1888:


Ravi, Can you add testcases for all combination of the following?
1) Mapper is command or is a java mapper.
2) Reducer is command or is java mapper or is None.
You can rename the testcase TestStreamingOutputKeyValueTypes instead of 
TestStreamingJavaTasks.


> Streaming overrides user given output key and value types.
> --
>
> Key: MAPREDUCE-1888
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1888
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1888.patch, 1888.v1.patch, 1888.v2.patch, 1888.v3.patch
>
>
> The following code in StreamJob.java overrides user given output key and 
> value types.
> {code}
> idResolver.resolve(conf.get(StreamJobConfig.MAP_OUTPUT,
> IdentifierResolver.TEXT_ID));
> conf.setClass(StreamJobConfig.MAP_OUTPUT_READER_CLASS,
>   idResolver.getOutputReaderClass(), OutputReader.class);
> job.setMapOutputKeyClass(idResolver.getOutputKeyClass());
> job.setMapOutputValueClass(idResolver.getOutputValueClass());
> 
> idResolver.resolve(conf.get(StreamJobConfig.REDUCE_OUTPUT,
> IdentifierResolver.TEXT_ID));
> conf.setClass(StreamJobConfig.REDUCE_OUTPUT_READER_CLASS,
>   idResolver.getOutputReaderClass(), OutputReader.class);
> job.setOutputKeyClass(idResolver.getOutputKeyClass());
> job.setOutputValueClass(idResolver.getOutputValueClass());
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (MAPREDUCE-1888) Streaming overrides user given output key and value types.

2010-06-30 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu reopened MAPREDUCE-1888:



I just reverted the commit to fix the corner case.

> Streaming overrides user given output key and value types.
> --
>
> Key: MAPREDUCE-1888
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1888
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1888.patch, 1888.v1.patch, 1888.v2.patch, 1888.v3.patch
>
>
> The following code in StreamJob.java overrides user given output key and 
> value types.
> {code}
> idResolver.resolve(conf.get(StreamJobConfig.MAP_OUTPUT,
> IdentifierResolver.TEXT_ID));
> conf.setClass(StreamJobConfig.MAP_OUTPUT_READER_CLASS,
>   idResolver.getOutputReaderClass(), OutputReader.class);
> job.setMapOutputKeyClass(idResolver.getOutputKeyClass());
> job.setMapOutputValueClass(idResolver.getOutputValueClass());
> 
> idResolver.resolve(conf.get(StreamJobConfig.REDUCE_OUTPUT,
> IdentifierResolver.TEXT_ID));
> conf.setClass(StreamJobConfig.REDUCE_OUTPUT_READER_CLASS,
>   idResolver.getOutputReaderClass(), OutputReader.class);
> job.setOutputKeyClass(idResolver.getOutputKeyClass());
> job.setOutputValueClass(idResolver.getOutputValueClass());
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1888) Streaming overrides user given output key and value types.

2010-06-30 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883826#action_12883826
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1888:


While working on MAPREDUCE-1122, i realized that one more corner case got 
missed here. For reducer=NONE, if the mapper is a command, then job output key 
and value types should be set from IO identifier.

> Streaming overrides user given output key and value types.
> --
>
> Key: MAPREDUCE-1888
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1888
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1888.patch, 1888.v1.patch, 1888.v2.patch, 1888.v3.patch
>
>
> The following code in StreamJob.java overrides user given output key and 
> value types.
> {code}
> idResolver.resolve(conf.get(StreamJobConfig.MAP_OUTPUT,
> IdentifierResolver.TEXT_ID));
> conf.setClass(StreamJobConfig.MAP_OUTPUT_READER_CLASS,
>   idResolver.getOutputReaderClass(), OutputReader.class);
> job.setMapOutputKeyClass(idResolver.getOutputKeyClass());
> job.setMapOutputValueClass(idResolver.getOutputValueClass());
> 
> idResolver.resolve(conf.get(StreamJobConfig.REDUCE_OUTPUT,
> IdentifierResolver.TEXT_ID));
> conf.setClass(StreamJobConfig.REDUCE_OUTPUT_READER_CLASS,
>   idResolver.getOutputReaderClass(), OutputReader.class);
> job.setOutputKeyClass(idResolver.getOutputKeyClass());
> job.setOutputValueClass(idResolver.getOutputValueClass());
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.