[jira] Updated: (MAPREDUCE-966) Rumen interface improvement

2009-09-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-966:


Status: Patch Available  (was: Open)

> Rumen interface improvement
> ---
>
> Key: MAPREDUCE-966
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-966
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>     Attachments: mapreduce-966-20090910-3.patch, 
> mapreduce-966-20090910-4.patch
>
>
> Rumen could expose a cleaner interface to simplify the integration with other 
> tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-966) Rumen interface improvement

2009-09-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-966:


Attachment: mapreduce-966-20090910-4.patch

Patch addresses the findbugs warnings.

> Rumen interface improvement
> ---
>
> Key: MAPREDUCE-966
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-966
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>     Attachments: mapreduce-966-20090910-3.patch, 
> mapreduce-966-20090910-4.patch
>
>
> Rumen could expose a cleaner interface to simplify the integration with other 
> tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-966) Rumen interface improvement

2009-09-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-966:


Status: Open  (was: Patch Available)

> Rumen interface improvement
> ---
>
> Key: MAPREDUCE-966
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-966
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>     Attachments: mapreduce-966-20090910-3.patch, 
> mapreduce-966-20090910-4.patch
>
>
> Rumen could expose a cleaner interface to simplify the integration with other 
> tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-972) distcp can timeout during rename operation to s3

2009-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754025#action_12754025
 ] 

Hadoop QA commented on MAPREDUCE-972:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419256/MAPREDUCE-972.2.patch
  against trunk revision 813660.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/27/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/27/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/27/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/27/console

This message is automatically generated.

> distcp can timeout during rename operation to s3
> 
>
> Key: MAPREDUCE-972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-972.2.patch, MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can 
> perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-892) command line tool to list all tasktrackers and their status

2009-09-10 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754014#action_12754014
 ] 

Amar Kamat commented on MAPREDUCE-892:
--

Few comments :
1) There is no need for a separate class ClusterStatusFull. We can use 
ClusterStatus class itself. I think its ok to add more info to the *detailed* 
mode of cluster status.
2) Instead of '-list-trackers-info' you can use '-list-trackers --detailed' or 
something like that. Something like
'./bin/hadoop job -list-tracker [--summary|--detailed]'

> command line tool to list all tasktrackers and their status
> ---
>
> Key: MAPREDUCE-892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-892
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Dmytro Molkov
> Attachments: MAPREDUCE-892.patch, MAPREDUCE-892.patch, 
> MAPREDUCE-892.patch.1
>
>
> The "hadoop mradmin -report" could list all the tasktrackers that the 
> JobTracker knows about. It will also list a brief status summary for each of 
> the TaskTracker. (This is similar to the hadop dfsadmin -report command that 
> lists all Datanodes)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-974) CLI command for viewing tasktrackers should not be under "job"

2009-09-10 Thread Amar Kamat (JIRA)
CLI command for viewing tasktrackers should not be under "job"
--

 Key: MAPREDUCE-974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-974
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Reporter: Amar Kamat


For viewing the tasktrackers in a mr cluster the command available is 
"./bin/hadoop job -list-tracker". But the tracker info is cluster level info 
and not job level. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-181) Secure job submission

2009-09-10 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754012#action_12754012
 ] 

Amar Kamat commented on MAPREDUCE-181:
--

Had a chat with Owen and here is the job submission process with few extra 
addons :
# jobclient requests the jobtracker for a jobid [say $jobid]
# jobclient upload job.xml, job.jar, job.split, job.splitmetainfo, version, 
libs, archives etc to the staging area i.e ~/.staging/$jobid
# jobclient now contructs a job-submission-token which contains 
  ## job staging area location (for job start and restart)
  ## job-submission version (for client-master compatibility)
  ## some checksum info (will expand on this later)
  ## user-credentials (for now username)
# jobclient passes job-submission-token over the rpc to jobtracker
# jobtracker persists this info in mapred.system.dir
# jobtracker uses the user-credentials in the job-meta-info to read the job.xml 
and job.splitmetainfo. 
# jobtracker checks for job staging checksum
# when the tasktracker asks for a task, a Task is passed which contains the 
location of job.split along with start-offset and length. 
# upon restart the jobtracker reads the job-meta info and re-submits the job 
(where the checksum check is done again)
# once the job is done, the staging area is deleted 

Checksum:
# job.xml md5 : this prevents jobtracker/tasktrackers from using a changed 
jobconf across job-submission and restarts.
# job-staging-area modification time : this prevents jobtracker and tasktracker 
for running jobs for which the staging area has changed.


> Secure job submission 
> --
>
> Key: MAPREDUCE-181
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Attachments: hadoop-3578-branch-20-example-2.patch, 
> hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
> HADOOP-3578-v2.7.patch, MAPRED-181-v3.8.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-181) Secure job submission

2009-09-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754006#action_12754006
 ] 

Owen O'Malley commented on MAPREDUCE-181:
-

.bq Why cant this be in the respective files as headers? Today we add the 
version info as the first line in the file.

It would have to be in all of the files. (job conf, raw split, split metadata) 
It seems easier to have a single version. In particular, at some point we will 
change the job conf from xml to binary. That isn't easy to do without a version 
on the directory.

.bq So you mean to say that we just persist jobid and job-staging location for 
restart/persistence?

Yes. The rest of the information would need to come from the staging 
directories. We should probably md5 the jobconf and verify it when it is 
downloaded by the task trackers and on restart.

I guess I should have listed two more disadvantages:
* the JobTracker needs to be the user to read the files from the staging area
* the user can mess with their jobs after they are submitted

Other than changing the job conf, I can't see any security problems with them 
changing any of the files.

> Secure job submission 
> --
>
> Key: MAPREDUCE-181
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Attachments: hadoop-3578-branch-20-example-2.patch, 
> hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
> HADOOP-3578-v2.7.patch, MAPRED-181-v3.8.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-09-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754005#action_12754005
 ] 

Todd Lipcon commented on MAPREDUCE-967:
---

Sounds good. Thanks for the review, Vinod. I'll take care of this soon and 
upload an up-to-date patch. I should have results from testing on the cluster 
as well to see if this does indeed reduce disk utilization appreciably.

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-862) Modify UI to support a hierarchy of queues

2009-09-10 Thread V.V.Chaitanya Krishna (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

V.V.Chaitanya Krishna reassigned MAPREDUCE-862:
---

Assignee: V.V.Chaitanya Krishna

> Modify UI to support a hierarchy of queues
> --
>
> Key: MAPREDUCE-862
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-862
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Hemanth Yamijala
>Assignee: V.V.Chaitanya Krishna
> Attachments: clustersummarymodification.png, detailspage.png, 
> initialscreen.png, subqueue.png
>
>
> MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
> framework. This JIRA is for defining changes to the UI related to queues. 
> This includes the hadoop queue CLI and the web UI on the JobTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-09-10 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754003#action_12754003
 ] 

Vinod K V commented on MAPREDUCE-967:
-

bq. Now that there are changes to RunJar, for trunk, can you move RunJar to 
mapreduce from common
bq. Sure thing. I'll take care of that when I post the patch for trunk.
Thanks! Please also make sure the new class in mapreduce is in 
org.apache.hadoop.mapreduce.util and create a deprecate class in 
org.apache.hadoop.util which uses functionality from 
org.apache.hadoop.mapreduce.util.RunJar.

bq. Do you see any use for filters here beyond a straight regex? We can express 
the old behaviour as /./ and the new behavior as /^(lib|classes)\//.
Yes, that should do, I think.

bq. Also, I'd prefer to make this an *undocumented configuration parameter, 
since I think there is very little use for the old version and we don't want to 
encourage people to abuse this.
Agreed. Even now, it is undocumented, AFAIK. A more appropriate reasoning for 
making it a configuration is that some users may want directories other than 
lib or classes to be unjarred.

bq. Would you see this being used as a per-job option or a TaskTracker-scoped 
option?
Per-job. By the time, we un-jar stuff on the TT, job configuration is already 
localized, so it's easy to get this option just before un-jarring.


> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-181) Secure job submission

2009-09-10 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754002#action_12754002
 ] 

Amar Kamat commented on MAPREDUCE-181:
--

bq. _version that contains the storage version (1.0 to start with)
Why cant this be in the respective files as headers? Today we add the version 
info as the first line in the file.

bq. The JobTracker doesn't need to do any writes to HDFS, just reads
So you mean to say that we just persist jobid and job-staging location for 
restart/persistence? Also the jobtracker will be forced do all the checks for 
job upon restart as the job files can change anytime. Also this is a change 
from the current model where the files once accepted cannot change. User now 
can change the jobconf while the job is running. 

> Secure job submission 
> --
>
> Key: MAPREDUCE-181
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Attachments: hadoop-3578-branch-20-example-2.patch, 
> hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
> HADOOP-3578-v2.7.patch, MAPRED-181-v3.8.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753999#action_12753999
 ] 

Hudson commented on MAPREDUCE-830:
--

Integrated in Hadoop-Hdfs-trunk-Commit #27 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/27/])
. Add support for splittable compression to TextInputFormats. Contributed 
by Abdul Qadeer


> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-973) Move "FailJob" from examples to test

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-973:


Attachment: M973-1.patch

Soright. Moved SleepJob to src/test

> Move "FailJob" from examples to test
> 
>
> Key: MAPREDUCE-973
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-973
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples, test
>Affects Versions: 0.21.0
>Reporter: Chris Douglas
>Priority: Trivial
> Fix For: 0.21.0
>
> Attachments: M973-0.patch, M973-1.patch
>
>
> The FailJob class (MAPREDUCE-567) is more a test utility than an example. It 
> should either move to src/test, ideally with a unit test built around it, or 
> be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-973) Move test utilities from examples to test

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-973:


Description: The FailJob class (MAPREDUCE-567) is more a test utility than 
an example. It should either move to src/test, ideally with a unit test built 
around it, or be removed. Similarly, SleepJob class is mostly used in unit 
tests.  (was: The FailJob class (MAPREDUCE-567) is more a test utility than an 
example. It should either move to src/test, ideally with a unit test built 
around it, or be removed.)
Summary: Move test utilities from examples to test  (was: Move 
"FailJob" from examples to test)

> Move test utilities from examples to test
> -
>
> Key: MAPREDUCE-973
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-973
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples, test
>Affects Versions: 0.21.0
>Reporter: Chris Douglas
>Priority: Trivial
> Fix For: 0.21.0
>
> Attachments: M973-0.patch, M973-1.patch
>
>
> The FailJob class (MAPREDUCE-567) is more a test utility than an example. It 
> should either move to src/test, ideally with a unit test built around it, or 
> be removed. Similarly, SleepJob class is mostly used in unit tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-09-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753994#action_12753994
 ] 

Todd Lipcon commented on MAPREDUCE-967:
---

bq. Now that there are changes to RunJar, for trunk, can you move RunJar to 
mapreduce from common

Sure thing. I'll take care of that when I post the patch for trunk.

bq. Also, I think, it will be good to make the filter to specify the 
directories/files in job.jar to be un-jarred as configurable

Do you see any use for filters here beyond a straight regex? We can express the 
old behaviour as /.*/ and the new behavior as /^(lib|classes)\//. Also, I'd 
prefer to make this an *undocumented* configuration parameter, since I think 
there is very little use for the old version and we don't want to encourage 
people to abuse this.

Would you see this being used as a per-job option or a TaskTracker-scoped 
option?

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-181) Secure job submission

2009-09-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753990#action_12753990
 ] 

Owen O'Malley commented on MAPREDUCE-181:
-

Ok, Arun and I discussed this offline and came up with the following proposal.

We put everything about the job into the job's staging area (~/.staging/$jobid)
* job conf
* the serialized bytes of the input splits
* the meta data for the splits (offset of split serialization, number of bytes 
in split, list of locations for split) for each split
* job jar

One last file that we need is because this effectively becomes interface is:
* _version that contains the storage version (1.0 to start with)

The advantages are:
* The JobTracker doesn't need to do any writes to HDFS, just reads
* The space counts against the user's quota on their home directory
* Small RPC message
* The job definition isn't split in two different places

The disadvantages are:
* Need versioning (so that hadoop 1.0 clients will work with hadoop 1.1 
JobTrackers)
* The job tracker is reading xml written by user code (need to move to binary 
eventually)
* The user can accidentally kill all of their jobs.

> Secure job submission 
> --
>
> Key: MAPREDUCE-181
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Attachments: hadoop-3578-branch-20-example-2.patch, 
> hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
> HADOOP-3578-v2.7.patch, MAPRED-181-v3.8.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-966) Rumen interface improvement

2009-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753991#action_12753991
 ] 

Hadoop QA commented on MAPREDUCE-966:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12419252/mapreduce-966-20090910-3.patch
  against trunk revision 813585.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/60/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/60/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/60/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/60/console

This message is automatically generated.

> Rumen interface improvement
> ---
>
> Key: MAPREDUCE-966
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-966
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>     Attachments: mapreduce-966-20090910-3.patch
>
>
> Rumen could expose a cleaner interface to simplify the integration with other 
> tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.

2009-09-10 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-861:


Status: Open  (was: Patch Available)

> Modify queue configuration format and parsing to support a hierarchy of 
> queues.
> ---
>
> Key: MAPREDUCE-861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Hemanth Yamijala
>Assignee: rahul k singh
> Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, 
> MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch, 
> MAPREDUCE-861-6.patch, MAPREDUCE-861-7.patch
>
>
> MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
> framework. This JIRA is for defining changes to the configuration related to 
> queues. 
> The current format for defining a queue and its properties is as follows: 
> mapred.queue... For e.g. 
> mapred.queue..acl-submit-job. The reason for using this verbose 
> format was to be able to reuse the Configuration parser in Hadoop. However, 
> administrators currently using the queue configuration have already indicated 
> a very strong desire for a more manageable format. Since, this becomes more 
> unwieldy with hierarchical queues, the time may be good to introduce a new 
> format for representing queue configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.

2009-09-10 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-861:


Status: Patch Available  (was: Open)

> Modify queue configuration format and parsing to support a hierarchy of 
> queues.
> ---
>
> Key: MAPREDUCE-861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Hemanth Yamijala
>Assignee: rahul k singh
> Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, 
> MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch, 
> MAPREDUCE-861-6.patch, MAPREDUCE-861-7.patch
>
>
> MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
> framework. This JIRA is for defining changes to the configuration related to 
> queues. 
> The current format for defining a queue and its properties is as follows: 
> mapred.queue... For e.g. 
> mapred.queue..acl-submit-job. The reason for using this verbose 
> format was to be able to reuse the Configuration parser in Hadoop. However, 
> administrators currently using the queue configuration have already indicated 
> a very strong desire for a more manageable format. Since, this becomes more 
> unwieldy with hierarchical queues, the time may be good to introduce a new 
> format for representing queue configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.

2009-09-10 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-861:


Status: Patch Available  (was: Open)

> Modify queue configuration format and parsing to support a hierarchy of 
> queues.
> ---
>
> Key: MAPREDUCE-861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Hemanth Yamijala
>Assignee: rahul k singh
> Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, 
> MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch, 
> MAPREDUCE-861-6.patch, MAPREDUCE-861-7.patch
>
>
> MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
> framework. This JIRA is for defining changes to the configuration related to 
> queues. 
> The current format for defining a queue and its properties is as follows: 
> mapred.queue... For e.g. 
> mapred.queue..acl-submit-job. The reason for using this verbose 
> format was to be able to reuse the Configuration parser in Hadoop. However, 
> administrators currently using the queue configuration have already indicated 
> a very strong desire for a more manageable format. Since, this becomes more 
> unwieldy with hierarchical queues, the time may be good to introduce a new 
> format for representing queue configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-09-10 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753987#action_12753987
 ] 

Vinod K V commented on MAPREDUCE-967:
-

Now that there are changes to RunJar, for trunk, can you move RunJar to 
mapreduce from common as is generally desired 
(https://issues.apache.org/jira/browse/MAPREDUCE-727?focusedCommentId=12728372&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12728372)
 ?

Also, I think, it will be good to make the filter to specify the 
directories/files in job.jar to be un-jarred as _configurable_. This way we can 
also maintain backward compatibility to the current scenario where in we un-jar 
everything. The configuration can be a comma separated list of files/dires for 
example. You may also need changes to the JarEntryFilter to accept wild-card 
entries. Thoughts?

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.

2009-09-10 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753989#action_12753989
 ] 

Amar Kamat commented on MAPREDUCE-157:
--

Note : This jira should take care of MAPREDUCE-926 and MAPREDUCE-881.

> Job History log file format is not friendly for external tools.
> ---
>
> Key: MAPREDUCE-157
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Affects Versions: 0.20.1
>Reporter: Owen O'Malley
>Assignee: Jothi Padmanabhan
> Fix For: 0.21.0
>
> Attachments: mapred-157-10Sep.patch, mapred-157-4Sep.patch, 
> mapred-157-7Sep-v1.patch, mapred-157-7Sep.patch, mapred-157-prelim.patch, 
> MAPREDUCE-157-avro.patch
>
>
> Currently, parsing the job history logs with external tools is very difficult 
> because of the format. The most critical problem is that newlines aren't 
> escaped in the strings. That makes using tools like grep, sed, and awk very 
> tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-240) Improve the shuffle phase by using the "connection: keep-alive" and doing batch transfers of files

2009-09-10 Thread Jothi Padmanabhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan resolved MAPREDUCE-240.
-

Resolution: Duplicate

MAPREDUCE-318 incorporated this

> Improve the shuffle phase by using the "connection: keep-alive" and doing 
> batch transfers of files
> --
>
> Key: MAPREDUCE-240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-240
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Devaraj Das
>Assignee: Jothi Padmanabhan
> Attachments: hadoop-1338-v1.patch, hadoop-1338-v2.patch
>
>
> We should do transfers of map outputs at the granularity of  
> *total-bytes-transferred* rather than the current way of transferring a 
> single file and then closing the connection to the server. A single 
> TaskTracker might have a couple of map output files for a given reduce, and 
> we should transfer multiple of them (upto a certain total size) in a single 
> connection to the TaskTracker. Using HTTP-1.1's keep-alive connection would 
> help since it would keep the connection open for more than one file transfer. 
> We should limit the transfers to a certain size so that we don't hold up a 
> jetty thread indefinitely (and cause timeouts for other clients).
> Overall, this should give us improved performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.

2009-09-10 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-861:


Attachment: MAPREDUCE-861-7.patch

Ran the testcase with previous patch . MAPREDUCE-861-6.patch locally. 
2 testcases failed . TestRecoveryManager and TestJobQueueInformation.

The new patch solves those problems ,
ran ant test on this new patch , all passed.
test-patch output is below:
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 46 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 


> Modify queue configuration format and parsing to support a hierarchy of 
> queues.
> ---
>
> Key: MAPREDUCE-861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Hemanth Yamijala
>Assignee: rahul k singh
> Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, 
> MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch, 
> MAPREDUCE-861-6.patch, MAPREDUCE-861-7.patch
>
>
> MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
> framework. This JIRA is for defining changes to the configuration related to 
> queues. 
> The current format for defining a queue and its properties is as follows: 
> mapred.queue... For e.g. 
> mapred.queue..acl-submit-job. The reason for using this verbose 
> format was to be able to reuse the Configuration parser in Hadoop. However, 
> administrators currently using the queue configuration have already indicated 
> a very strong desire for a more manageable format. Since, this becomes more 
> unwieldy with hierarchical queues, the time may be good to introduce a new 
> format for representing queue configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-881) Jobtracker continues even if History initialization fails

2009-09-10 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat resolved MAPREDUCE-881.
--

Resolution: Duplicate

MAPREDUCE-157

> Jobtracker continues even if History initialization fails
> -
>
> Key: MAPREDUCE-881
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-881
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Sharad Agarwal
>
> If there is some problem in the configuration, Job history initialization 
> fails. JobHistory#init catches the exception and disable the history. This 
> leads to job history not working as expected. However administrators won't 
> notice that there is some problem in the config due to which history got 
> disabled, unless they see the logs. Better approach would be to not catch the 
> exception and let Jobtracker fail to come up if there is error in 
> initialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-727) Move the bin/hadoop jar command over to bin/mapred

2009-09-10 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-727:


Fix Version/s: 0.21.0

> Move the bin/hadoop jar command over to bin/mapred
> --
>
> Key: MAPREDUCE-727
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-727
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun C Murthy
> Fix For: 0.21.0
>
>
> Currently 'bin/hadoop jar' is used to submit jobs, we should move it over to 
> bin/mapred.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-372) Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.

2009-09-10 Thread Jothi Padmanabhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated MAPREDUCE-372:


Attachment: mapred-372.patch

Sorry,  forgot to do svn add before. Here is the full patch

> Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.
> ---
>
> Key: MAPREDUCE-372
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-372
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: mapred-372.patch, mapred-372.patch, patch-372-1.txt, 
> patch-372.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753979#action_12753979
 ] 

Hudson commented on MAPREDUCE-830:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #30 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/30/])
. Add support for splittable compression to TextInputFormats. Contributed 
by Abdul Qadeer


> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-973) Move "FailJob" from examples to test

2009-09-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753978#action_12753978
 ] 

Owen O'Malley commented on MAPREDUCE-973:
-

Go ahead and just move sleep job too. I can't see any one minding. The examples 
are certainly public evolving, anyways.

> Move "FailJob" from examples to test
> 
>
> Key: MAPREDUCE-973
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-973
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples, test
>Affects Versions: 0.21.0
>Reporter: Chris Douglas
>Priority: Trivial
> Fix For: 0.21.0
>
> Attachments: M973-0.patch
>
>
> The FailJob class (MAPREDUCE-567) is more a test utility than an example. It 
> should either move to src/test, ideally with a unit test built around it, or 
> be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-973) Move "FailJob" from examples to test

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-973:


Priority: Trivial  (was: Major)

> Move "FailJob" from examples to test
> 
>
> Key: MAPREDUCE-973
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-973
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples, test
>Affects Versions: 0.21.0
>Reporter: Chris Douglas
>Priority: Trivial
> Fix For: 0.21.0
>
> Attachments: M973-0.patch
>
>
> The FailJob class (MAPREDUCE-567) is more a test utility than an example. It 
> should either move to src/test, ideally with a unit test built around it, or 
> be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-973) Move "FailJob" from examples to test

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-973:


Status: Patch Available  (was: Open)

> Move "FailJob" from examples to test
> 
>
> Key: MAPREDUCE-973
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-973
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples, test
>Affects Versions: 0.21.0
>Reporter: Chris Douglas
>Priority: Trivial
> Fix For: 0.21.0
>
> Attachments: M973-0.patch
>
>
> The FailJob class (MAPREDUCE-567) is more a test utility than an example. It 
> should either move to src/test, ideally with a unit test built around it, or 
> be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-973) Move "FailJob" from examples to test

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-973:


Attachment: M973-0.patch

bq. If you haven't done this already, I'm happy to move it to test. Should 
SleepJob move too?

SleepJob was added in 2007 so we'd have to deprecate, etc. to do that. Since 
FailJob was added in 0.21, it's easy to move.

Attached a trivial move to src/test

> Move "FailJob" from examples to test
> 
>
> Key: MAPREDUCE-973
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-973
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples, test
>Affects Versions: 0.21.0
>Reporter: Chris Douglas
> Fix For: 0.21.0
>
> Attachments: M973-0.patch
>
>
> The FailJob class (MAPREDUCE-567) is more a test utility than an example. It 
> should either move to src/test, ideally with a unit test built around it, or 
> be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-973) Move "FailJob" from examples to test

2009-09-10 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753976#action_12753976
 ] 

Philip Zeyliger commented on MAPREDUCE-973:
---

If you haven't done this already, I'm happy to move it to test.  Should 
SleepJob move too?

> Move "FailJob" from examples to test
> 
>
> Key: MAPREDUCE-973
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-973
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples, test
>Affects Versions: 0.21.0
>Reporter: Chris Douglas
> Fix For: 0.21.0
>
>
> The FailJob class (MAPREDUCE-567) is more a test utility than an example. It 
> should either move to src/test, ideally with a unit test built around it, or 
> be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-969) NullPointerException during reduce freezes job

2009-09-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753974#action_12753974
 ] 

Todd Lipcon commented on MAPREDUCE-969:
---

Unfortunately not - by the time I realized that useful info was in there it had 
been rotated and deleted.

> NullPointerException during reduce freezes job
> --
>
> Key: MAPREDUCE-969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker, task, tasktracker
>Affects Versions: 0.20.2
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: bad_job_events, bad_job_jt_logs, reduce_task_logs
>
>
> We experienced several jobs stuck in Reduce on a cluster. All of the stuck 
> reduce tasks had a similar were stuck at "Need another 2 map output(s) where 
> 0 is already in progress" despite all of the mappers having completed, and 0 
> scheduled. The stuck reducers had experienced the following exception early 
> in the shuffle:
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2747)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2670)
> Will attach more information and logs momentarily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

2009-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753973#action_12753973
 ] 

Hadoop QA commented on MAPREDUCE-971:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419248/MAPREDUCE-971.patch
  against trunk revision 813585.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/26/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/26/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/26/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/26/console

This message is automatically generated.

> distcp does not always remove distcp.tmp.dir
> 
>
> Key: MAPREDUCE-971
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-969) NullPointerException during reduce freezes job

2009-09-10 Thread Jothi Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753971#action_12753971
 ] 

Jothi Padmanabhan commented on MAPREDUCE-969:
-

Do you have the TT log -- the TT whose port was returned as -1 (or null) and 
where the map in question got completed. We added a log to print the port 
number there

> NullPointerException during reduce freezes job
> --
>
> Key: MAPREDUCE-969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker, task, tasktracker
>Affects Versions: 0.20.2
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: bad_job_events, bad_job_jt_logs, reduce_task_logs
>
>
> We experienced several jobs stuck in Reduce on a cluster. All of the stuck 
> reduce tasks had a similar were stuck at "Need another 2 map output(s) where 
> 0 is already in progress" despite all of the mappers having completed, and 0 
> scheduled. The stuck reducers had experienced the following exception early 
> in the shuffle:
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2747)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2670)
> Will attach more information and logs momentarily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-973) Move "FailJob" from examples to test

2009-09-10 Thread Chris Douglas (JIRA)
Move "FailJob" from examples to test


 Key: MAPREDUCE-973
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-973
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples, test
Affects Versions: 0.21.0
Reporter: Chris Douglas
 Fix For: 0.21.0


The FailJob class (MAPREDUCE-567) is more a test utility than an example. It 
should either move to src/test, ideally with a unit test built around it, or be 
removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-969) NullPointerException during reduce freezes job

2009-09-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753968#action_12753968
 ] 

Todd Lipcon commented on MAPREDUCE-969:
---

Yea, I looked at HADOOP-4744 as well as a couple other JIRAs but wasn't able to 
figure it out. If it keeps popping up, we will instrument the code with some 
debug logging and see if we can track it down.

> NullPointerException during reduce freezes job
> --
>
> Key: MAPREDUCE-969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker, task, tasktracker
>Affects Versions: 0.20.2
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: bad_job_events, bad_job_jt_logs, reduce_task_logs
>
>
> We experienced several jobs stuck in Reduce on a cluster. All of the stuck 
> reduce tasks had a similar were stuck at "Need another 2 map output(s) where 
> 0 is already in progress" despite all of the mappers having completed, and 0 
> scheduled. The stuck reducers had experienced the following exception early 
> in the shuffle:
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2747)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2670)
> Will attach more information and logs momentarily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-830:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1

I committed this. Thanks, Abdul!

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-969) NullPointerException during reduce freezes job

2009-09-10 Thread Jothi Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753964#action_12753964
 ] 

Jothi Padmanabhan commented on MAPREDUCE-969:
-

OK, from our earlier investigations, this was primarily caused by HADOOP-4744. 
We were never really able to reproduce this consistently and evidently the work 
arounds in 4744 has not helped...

GetMapEventsThread ignoring exceptions -- you are right. We probably should 
catch and bail out. We did this change for MAPREDUCE-318. We probably should 
port it to 20 as well.

> NullPointerException during reduce freezes job
> --
>
> Key: MAPREDUCE-969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker, task, tasktracker
>Affects Versions: 0.20.2
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: bad_job_events, bad_job_jt_logs, reduce_task_logs
>
>
> We experienced several jobs stuck in Reduce on a cluster. All of the stuck 
> reduce tasks had a similar were stuck at "Need another 2 map output(s) where 
> 0 is already in progress" despite all of the mappers having completed, and 0 
> scheduled. The stuck reducers had experienced the following exception early 
> in the shuffle:
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2747)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2670)
> Will attach more information and logs momentarily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.

2009-09-10 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-861:


Status: Open  (was: Patch Available)

> Modify queue configuration format and parsing to support a hierarchy of 
> queues.
> ---
>
> Key: MAPREDUCE-861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Hemanth Yamijala
>Assignee: rahul k singh
> Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, 
> MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch, 
> MAPREDUCE-861-6.patch
>
>
> MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
> framework. This JIRA is for defining changes to the configuration related to 
> queues. 
> The current format for defining a queue and its properties is as follows: 
> mapred.queue... For e.g. 
> mapred.queue..acl-submit-job. The reason for using this verbose 
> format was to be able to reuse the Configuration parser in Hadoop. However, 
> administrators currently using the queue configuration have already indicated 
> a very strong desire for a more manageable format. Since, this becomes more 
> unwieldy with hierarchical queues, the time may be good to introduce a new 
> format for representing queue configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-970) task-controller/configuration.c:get_values is broken

2009-09-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-970:


Attachment: MAPREDUCE-970.patch

Straight-forward fix.

> task-controller/configuration.c:get_values is broken
> 
>
> Key: MAPREDUCE-970
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-970
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-970.patch
>
>
> task-controller/configuration.c:get_values is supposed to return a char** 
> with the last element set to NULL.
> It doesn't correctly handle empty config-values, #values as an exactly 
> multiple of MAX_SIZE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-972) distcp can timeout during rename operation to s3

2009-09-10 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-972:


Status: Patch Available  (was: Open)

> distcp can timeout during rename operation to s3
> 
>
> Key: MAPREDUCE-972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-972.2.patch, MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can 
> perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-972) distcp can timeout during rename operation to s3

2009-09-10 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-972:


Attachment: MAPREDUCE-972.2.patch

New patch to address your comments.

> distcp can timeout during rename operation to s3
> 
>
> Key: MAPREDUCE-972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-972.2.patch, MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can 
> perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-972) distcp can timeout during rename operation to s3

2009-09-10 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-972:


Status: Open  (was: Patch Available)

> distcp can timeout during rename operation to s3
> 
>
> Key: MAPREDUCE-972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can 
> perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-972) distcp can timeout during rename operation to s3

2009-09-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753946#action_12753946
 ] 

Todd Lipcon commented on MAPREDUCE-972:
---

Couple bits of feedback:

- ProgressThread could do with some short javadoc
- isComplete should be marked volatile or made into an AtomicBoolean - then you 
don't have to worry about synchronization on it or the odd copying into 
myComplete
- can ProgressThread be a static class? I think so.

Other than that, lgtm.

> distcp can timeout during rename operation to s3
> 
>
> Key: MAPREDUCE-972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can 
> perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-972) distcp can timeout during rename operation to s3

2009-09-10 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-972:


Status: Patch Available  (was: Open)

> distcp can timeout during rename operation to s3
> 
>
> Key: MAPREDUCE-972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can 
> perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-966) Rumen interface improvement

2009-09-10 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753943#action_12753943
 ] 

Hong Tang commented on MAPREDUCE-966:
-

The attached patch implements the changes proposed above. During the process, 
we also fixed a few minor issues:
- Changed the usage of java io API (File etc) to hadoop Path, FileSystem, 
Configuration.
- Upgraded the tests to junit 4.
- Use the newly added JsonObjectMapperParser to replace customized json parsing 
in TestRumenJobTraces.
- Replaced the usage of Vector to List in unit tests.
- Fixed an NPE bug in HadoopLogAnalyzer.
- Added an API getOutcome() in JobStory.
- Fixed a bug in ParsedHost where it fails to parse rack names contain 
non-digital characters.
- Fixed a bug where ZombieJob.getTaskInfo() returns uninitialized TaskInfo 
objects.
- Adapt the usage of newly added ClusterStory in ZombieJob.

> Rumen interface improvement
> ---
>
> Key: MAPREDUCE-966
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-966
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>     Attachments: mapreduce-966-20090910-3.patch
>
>
> Rumen could expose a cleaner interface to simplify the integration with other 
> tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-966) Rumen interface improvement

2009-09-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-966:


Status: Patch Available  (was: Open)

> Rumen interface improvement
> ---
>
> Key: MAPREDUCE-966
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-966
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>     Attachments: mapreduce-966-20090910-3.patch
>
>
> Rumen could expose a cleaner interface to simplify the integration with other 
> tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-972) distcp can timeout during rename operation to s3

2009-09-10 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-972:


Attachment: MAPREDUCE-972.patch

Attaching a patch which starts a background thread to increment mapper progress 
when the rename operation is running.

We benchmarked S3 copy performance at ~4 MB/sec, which means that files in the 
3--5 GB size range may cause task timeouts during their renames into their 
final locations. This patch will fix this issue.

This patch was tested manually by running distcp to upload data to s3n and 
verifying that renames still worked as expected, and that log messages 
confirmed creation and destruction of the background progress thread.

> distcp can timeout during rename operation to s3
> 
>
> Key: MAPREDUCE-972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can 
> perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-972) distcp can timeout during rename operation to s3

2009-09-10 Thread Aaron Kimball (JIRA)
distcp can timeout during rename operation to s3


 Key: MAPREDUCE-972
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 0.20.1
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-972.patch

rename() in S3 is implemented as copy + delete. The S3 copy operation can 
perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-966) Rumen interface improvement

2009-09-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-966:


Attachment: mapreduce-966-20090910-3.patch

> Rumen interface improvement
> ---
>
> Key: MAPREDUCE-966
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-966
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>     Attachments: mapreduce-966-20090910-3.patch
>
>
> Rumen could expose a cleaner interface to simplify the integration with other 
> tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753930#action_12753930
 ] 

Hadoop QA commented on MAPREDUCE-830:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419222/M830-4.patch
  against trunk revision 813585.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/59/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/59/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/59/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/59/console

This message is automatically generated.

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-968) NPE in distcp encountered when placing _logs directory on S3FileSystem

2009-09-10 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753927#action_12753927
 ] 

Aaron Kimball commented on MAPREDUCE-968:
-

Test failure is unrelated.

> NPE in distcp encountered when placing _logs directory on S3FileSystem
> --
>
> Key: MAPREDUCE-968
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-968
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-968.patch
>
>
> If distcp is pointed to an empty S3 bucket as the destination for an s3:// 
> filesystem transfer, it will fail with the following exception
> Copy failed: java.lang.NullPointerException
> at org.apache.hadoop.fs.s3.S3FileSystem.makeAbsolute(S3FileSystem.java:121)
> at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:332)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:633)
> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1005)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:650)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:884) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

2009-09-10 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-971:


Status: Patch Available  (was: Open)

> distcp does not always remove distcp.tmp.dir
> 
>
> Key: MAPREDUCE-971
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

2009-09-10 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-971:


Attachment: MAPREDUCE-971.patch

This patch fixes the problem by explcitly creating the temp directory. File 
open operations in, e.g., hdfs, will auto-create the tmpdir. But in s3n, which 
expects an object with the name {{_somename_$folder$}}, this won't happen. As a 
result, the {{fullyDelete()}} call fails (silently) because the folder doesn't 
exist, even though there are objects with the tmpdir prefix in their object 
names.

I tested this patch manually by verifying temp dir creation during a distcp to 
s3n, and verifying that the temp dir object was removed at the end of the 
transfer.

> distcp does not always remove distcp.tmp.dir
> 
>
> Key: MAPREDUCE-971
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-968) NPE in distcp encountered when placing _logs directory on S3FileSystem

2009-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753914#action_12753914
 ] 

Hadoop QA commented on MAPREDUCE-968:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419220/MAPREDUCE-968.patch
  against trunk revision 813585.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/25/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/25/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/25/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/25/console

This message is automatically generated.

> NPE in distcp encountered when placing _logs directory on S3FileSystem
> --
>
> Key: MAPREDUCE-968
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-968
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-968.patch
>
>
> If distcp is pointed to an empty S3 bucket as the destination for an s3:// 
> filesystem transfer, it will fail with the following exception
> Copy failed: java.lang.NullPointerException
> at org.apache.hadoop.fs.s3.S3FileSystem.makeAbsolute(S3FileSystem.java:121)
> at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:332)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:633)
> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1005)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:650)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:884) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

2009-09-10 Thread Aaron Kimball (JIRA)
distcp does not always remove distcp.tmp.dir


 Key: MAPREDUCE-971
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Reporter: Aaron Kimball
Assignee: Aaron Kimball


Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-969) NullPointerException during reduce freezes job

2009-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-969:
--

Attachment: reduce_task_logs
bad_job_jt_logs
bad_job_events

Attaching sanitized logs from this incident.

The event that seems to be a red flag is the lost task tracker xx05.

The null pointer exception is caused by u.getHost() being null - this URI is 
the taskTrackerHttpAddress in a TaskTrackerStatus. The job event output doesn't 
show any with a malformed URL, so I suspect some kind of race.

Aside from this issue, I find it odd that GetMapEventsThread ignores 
exceptions. In cases like this it will cause the ReduceTask to spin forever 
while still reporting progress until the user intervenes.

> NullPointerException during reduce freezes job
> --
>
> Key: MAPREDUCE-969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker, task, tasktracker
>Affects Versions: 0.20.2
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: bad_job_events, bad_job_jt_logs, reduce_task_logs
>
>
> We experienced several jobs stuck in Reduce on a cluster. All of the stuck 
> reduce tasks had a similar were stuck at "Need another 2 map output(s) where 
> 0 is already in progress" despite all of the mappers having completed, and 0 
> scheduled. The stuck reducers had experienced the following exception early 
> in the shuffle:
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2747)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2670)
> Will attach more information and logs momentarily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-970) task-controller/configuration.c:get_values is broken

2009-09-10 Thread Arun C Murthy (JIRA)
task-controller/configuration.c:get_values is broken


 Key: MAPREDUCE-970
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-970
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Blocker
 Fix For: 0.21.0


task-controller/configuration.c:get_values is supposed to return a char** with 
the last element set to NULL.

It doesn't correctly handle empty config-values, #values as an exactly multiple 
of MAX_SIZE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-969) NullPointerException during reduce freezes job

2009-09-10 Thread Todd Lipcon (JIRA)
NullPointerException during reduce freezes job
--

 Key: MAPREDUCE-969
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-969
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker, task, tasktracker
Affects Versions: 0.20.2
Reporter: Todd Lipcon
Assignee: Todd Lipcon


We experienced several jobs stuck in Reduce on a cluster. All of the stuck 
reduce tasks had a similar were stuck at "Need another 2 map output(s) where 0 
is already in progress" despite all of the mappers having completed, and 0 
scheduled. The stuck reducers had experienced the following exception early in 
the shuffle:

java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2747)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2670)

Will attach more information and logs momentarily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-954) The new interface's Context objects should be interfaces

2009-09-10 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753886#action_12753886
 ] 

Doug Cutting commented on MAPREDUCE-954:


+1 This sounds like a good plan to me.

> The new interface's Context objects should be interfaces
> 
>
> Key: MAPREDUCE-954
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-954
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Owen O'Malley
>Assignee: Arun C Murthy
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-954.patch
>
>
> When I was doing HADOOP-1230, I was persuaded to make the Context objects as 
> classes. I think that was a serious mistake. It caused a lot of information 
> leakage into the public classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-954) The new interface's Context objects should be interfaces

2009-09-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753867#action_12753867
 ] 

Owen O'Malley commented on MAPREDUCE-954:
-

Only application frameworks like ChainMapper will implement Contexts. 
Applications should only use Contexts.

The MapperImpl is private to map/reduce and will only consist of a method to 
create a ContextImpl. No other method on it will ever be called. ContextImpl 
will have the method bodies to implement the context and will be given the user 
in the places where a Mapper.Context is required.

Since the Context objects are public, we will need to allow bodies on methods 
to support backwards compatibility of frameworks as we add methods to them, so 
they will start as pure abstract classes and slowly gain default bodies for 
added methods.

> The new interface's Context objects should be interfaces
> 
>
> Key: MAPREDUCE-954
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-954
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Owen O'Malley
>Assignee: Arun C Murthy
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-954.patch
>
>
> When I was doing HADOOP-1230, I was persuaded to make the Context objects as 
> classes. I think that was a serious mistake. It caused a lot of information 
> leakage into the public classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-946) Fix regression in LineRecordReader to comply with line length parameters

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-946:


 Priority: Blocker  (was: Major)
Fix Version/s: 0.21.0

> Fix regression in LineRecordReader to comply with line length parameters
> 
>
> Key: MAPREDUCE-946
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-946
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chris Douglas
>Priority: Blocker
> Fix For: 0.21.0
>
>
> MAPREDUCE-773 accidentally changed code introduced in HADOOP-3144 controlling 
> max line lengths. The behavior should be restored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-830:


Status: Open  (was: Patch Available)

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-830:


Attachment: M830-4.patch

\*grumble\* \-\-no\-prefix \*grumble\*

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-830:


Status: Patch Available  (was: Open)

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753845#action_12753845
 ] 

Hadoop QA commented on MAPREDUCE-830:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419221/M830-4.patch
  against trunk revision 813585.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/58/console

This message is automatically generated.

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-477) Support for reading bzip2 compressed file created using concatenation of multiple .bz2 files

2009-09-10 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753841#action_12753841
 ] 

Chris Douglas commented on MAPREDUCE-477:
-

If this is solved by HADOOP-4012, it would be helpful to have a unit test. I'll 
leave this open to track that

> Support for reading bzip2 compressed file created using concatenation of 
> multiple .bz2 files 
> -
>
> Key: MAPREDUCE-477
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-477
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Suhas Gogate
>Priority: Minor
>
> Bzip2Codec supported in Hadoop 0.19/0.20  should support for reading bzip2 
> compressed file created using concatenation of multiple .bz2 files 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-839) unit test TestMiniMRChildTask fails on mac os-x

2009-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753836#action_12753836
 ] 

Hadoop QA commented on MAPREDUCE-839:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12419206/mapreduce-839-20090910-2.patch
  against trunk revision 813308.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/57/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/57/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/57/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/57/console

This message is automatically generated.

> unit test TestMiniMRChildTask fails on mac os-x
> ---
>
> Key: MAPREDUCE-839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-839
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: mapreduce-839-20090828.patch, 
> mapreduce-839-20090910-2.patch, mapreduce-839-20090910.patch
>
>
> The unit test TestMiniMRChildTask fails on Mac OS-X (10.5.8)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-830:


Status: Patch Available  (was: Open)

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-830:


Status: Open  (was: Patch Available)

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-830:


Attachment: M830-4.patch

Fixed copy/paste bug

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, M830-4.patch, 
> MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753832#action_12753832
 ] 

Hadoop QA commented on MAPREDUCE-830:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418869/M830-3.patch
  against trunk revision 813585.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/24/testReport/
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/24/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/24/console

This message is automatically generated.

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-968) NPE in distcp encountered when placing _logs directory on S3FileSystem

2009-09-10 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-968:


Attachment: MAPREDUCE-968.patch

This patch fixes the issue. If the destination directory is '/' and doesn't 
exist, it will fall into a case where Path.getParent() is used to compute the 
{{\_logs}} target directory name. This returns null if the Path is '/'. In this 
special case, the '/' directory needs to be created by distcp too.

No unit test because this requires creating S3 buckets. I manually tested this 
by creating an empty S3 bucket and running:

{code}
bin/hadoop distcp some-hdfs-dir s3://:@my-new-bucket/
{code}

This failed with the NPE. After the patch, this succeeded. Confirmed that file 
uploads worked via

{code}
bin/hadoop fs -ls s3://:@my-new-bucket/
{code}



> NPE in distcp encountered when placing _logs directory on S3FileSystem
> --
>
> Key: MAPREDUCE-968
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-968
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-968.patch
>
>
> If distcp is pointed to an empty S3 bucket as the destination for an s3:// 
> filesystem transfer, it will fail with the following exception
> Copy failed: java.lang.NullPointerException
> at org.apache.hadoop.fs.s3.S3FileSystem.makeAbsolute(S3FileSystem.java:121)
> at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:332)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:633)
> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1005)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:650)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:884) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-968) NPE in distcp encountered when placing _logs directory on S3FileSystem

2009-09-10 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-968:


Status: Patch Available  (was: Open)

> NPE in distcp encountered when placing _logs directory on S3FileSystem
> --
>
> Key: MAPREDUCE-968
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-968
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-968.patch
>
>
> If distcp is pointed to an empty S3 bucket as the destination for an s3:// 
> filesystem transfer, it will fail with the following exception
> Copy failed: java.lang.NullPointerException
> at org.apache.hadoop.fs.s3.S3FileSystem.makeAbsolute(S3FileSystem.java:121)
> at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:332)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:633)
> at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1005)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:650)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:884) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-968) NPE in distcp encountered when placing _logs directory on S3FileSystem

2009-09-10 Thread Aaron Kimball (JIRA)
NPE in distcp encountered when placing _logs directory on S3FileSystem
--

 Key: MAPREDUCE-968
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-968
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 0.20.1
Reporter: Aaron Kimball
Assignee: Aaron Kimball


If distcp is pointed to an empty S3 bucket as the destination for an s3:// 
filesystem transfer, it will fail with the following exception

Copy failed: java.lang.NullPointerException
at org.apache.hadoop.fs.s3.S3FileSystem.makeAbsolute(S3FileSystem.java:121)
at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:332)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:633)
at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1005)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:650)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:884) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

2009-09-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-830:


Status: Patch Available  (was: Open)

> Providing BZip2 splitting support for Text data
> ---
>
> Key: MAPREDUCE-830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Abdul Qadeer
>Assignee: Abdul Qadeer
> Fix For: 0.21.0
>
> Attachments: M830-2.patch, M830-3.patch, MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
> support to handle BZip2 compressed data such that the input compressed file 
> is split at arbitrary points.  This JIRA uses that functionality in 
> LineRecordReader.  The benefit of this work is that, if user provides 
> compressed BZip2 Text data, it will be split by Hadoop and hence will be 
> processed by multiple mappers.  So BZip2 compressed data will be able to 
> fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
> to one mapper and is not split.  So the enhancement in this JIRA provides 
> splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-839) unit test TestMiniMRChildTask fails on mac os-x

2009-09-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-839:


Status: Patch Available  (was: Open)

test-patch and run-commit-test passed on my local machine.

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 7 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

> unit test TestMiniMRChildTask fails on mac os-x
> ---
>
> Key: MAPREDUCE-839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-839
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: mapreduce-839-20090828.patch, 
> mapreduce-839-20090910-2.patch, mapreduce-839-20090910.patch
>
>
> The unit test TestMiniMRChildTask fails on Mac OS-X (10.5.8)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-839) unit test TestMiniMRChildTask fails on mac os-x

2009-09-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-839:


Status: Open  (was: Patch Available)

> unit test TestMiniMRChildTask fails on mac os-x
> ---
>
> Key: MAPREDUCE-839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-839
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: mapreduce-839-20090828.patch, 
> mapreduce-839-20090910-2.patch, mapreduce-839-20090910.patch
>
>
> The unit test TestMiniMRChildTask fails on Mac OS-X (10.5.8)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-839) unit test TestMiniMRChildTask fails on mac os-x

2009-09-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-839:


Attachment: mapreduce-839-20090910-2.patch

Removed a few unused imports.

> unit test TestMiniMRChildTask fails on mac os-x
> ---
>
> Key: MAPREDUCE-839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-839
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: mapreduce-839-20090828.patch, 
> mapreduce-839-20090910-2.patch, mapreduce-839-20090910.patch
>
>
> The unit test TestMiniMRChildTask fails on Mac OS-X (10.5.8)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-839) unit test TestMiniMRChildTask fails on mac os-x

2009-09-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-839:


Attachment: mapreduce-839-20090910.patch

New patch that relies on properties instead of environment variables. The 
defaults of these properties are derived from system environment variables $TMP 
and $TEMP. Running test-patch locally now.

> unit test TestMiniMRChildTask fails on mac os-x
> ---
>
> Key: MAPREDUCE-839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-839
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Hong Tang
>Assignee: Hong Tang
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: mapreduce-839-20090828.patch, 
> mapreduce-839-20090910.patch
>
>
> The unit test TestMiniMRChildTask fails on Mac OS-X (10.5.8)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-679) XML-based metrics as JSP servlet for JobTracker

2009-09-10 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753729#action_12753729
 ] 

Aaron Kimball commented on MAPREDUCE-679:
-

The new test failure is unrelated.

> XML-based metrics as JSP servlet for JobTracker
> ---
>
> Key: MAPREDUCE-679
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-679
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: jobtracker
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: example-jobtracker-completed-job.xml, 
> example-jobtracker-running-job.xml, MAPREDUCE-679.2.patch, 
> MAPREDUCE-679.3.patch, MAPREDUCE-679.4.patch, MAPREDUCE-679.5.patch, 
> MAPREDUCE-679.patch
>
>
> In HADOOP-4559, a general REST API for reporting metrics was proposed but 
> work seems to have stalled. In the interim, we have a simple XML translation 
> of the existing JobTracker status page which provides the same metrics 
> (including the tables of running/completed/failed jobs) as the human-readable 
> page. This is a relatively lightweight addition to provide some 
> machine-understandable metrics reporting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-954) The new interface's Context objects should be interfaces

2009-09-10 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753708#action_12753708
 ] 

Doug Cutting commented on MAPREDUCE-954:


> we should take this opportunity to atleast make the Context Objects pure 
> abstract classes if not go the full hog and make them interfaces.

I'm okay with pure abstract classes but have concerns about the evolvability of 
interfaces.

I'm trying to understand Owen's proposal.  Here's my guess: Applications won't 
implement Mapper.Context.  Rather the framework will implement it, and 
applications will access it referencing the abstract API.  But for the 
framework to implement it, it must define it within a Mapper, since it cannot 
be a static, standalone class and still be generic.  The framework's Mapper 
implementation won't actually be used other than to create a 
Mapper.ContextImpl--other mapper methods will throw 
UnimplementedMethodException. Do I have this right?


> The new interface's Context objects should be interfaces
> 
>
> Key: MAPREDUCE-954
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-954
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Owen O'Malley
>Assignee: Arun C Murthy
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-954.patch
>
>
> When I was doing HADOOP-1230, I was persuaded to make the Context objects as 
> classes. I think that was a serious mistake. It caused a lot of information 
> leakage into the public classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-09-10 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753693#action_12753693
 ] 

Doug Cutting commented on MAPREDUCE-967:


> I'm not sure I see the purpose of the "classes/" directory [ ... ]

This was done by analogy with .war files.

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-881) Jobtracker continues even if History initialization fails

2009-09-10 Thread Jothi Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753685#action_12753685
 ] 

Jothi Padmanabhan commented on MAPREDUCE-881:
-

In MAPREDUCE-157, we do just that -- let JT fail if there is a problem with 
history initialization. We could close this after that gets in

> Jobtracker continues even if History initialization fails
> -
>
> Key: MAPREDUCE-881
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-881
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Sharad Agarwal
>
> If there is some problem in the configuration, Job history initialization 
> fails. JobHistory#init catches the exception and disable the history. This 
> leads to job history not working as expected. However administrators won't 
> notice that there is some problem in the config due to which history got 
> disabled, unless they see the logs. Better approach would be to not catch the 
> exception and let Jobtracker fail to come up if there is error in 
> initialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-926) History viewer on web UI should filter by job-id also

2009-09-10 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat resolved MAPREDUCE-926.
--

Resolution: Duplicate

Will be incorporated in MAPRED-157. Closing this for now.

> History viewer on web UI should filter by job-id also
> -
>
> Key: MAPREDUCE-926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-926
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Vinod K V
>Assignee: Jothi Padmanabhan
>
> Job-id is the most famous handle to a job and there should be easier ways of 
> hunting down a job from the history web viewer. Currently, filtering is 
> supported to be based on job name and job owner's name. Job-id is a necessary 
> addition to the list of filters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-881) Jobtracker continues even if History initialization fails

2009-09-10 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753680#action_12753680
 ] 

Amar Kamat commented on MAPREDUCE-881:
--

Sharad, is this still valid? 

> Jobtracker continues even if History initialization fails
> -
>
> Key: MAPREDUCE-881
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-881
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Sharad Agarwal
>
> If there is some problem in the configuration, Job history initialization 
> fails. JobHistory#init catches the exception and disable the history. This 
> leads to job history not working as expected. However administrators won't 
> notice that there is some problem in the config due to which history got 
> disabled, unless they see the logs. Better approach would be to not catch the 
> exception and let Jobtracker fail to come up if there is error in 
> initialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-112) Reduce Input Records and Reduce Output Records counters are not being set when using the new Mapreduce reducer API

2009-09-10 Thread Jothi Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753677#action_12753677
 ] 

Jothi Padmanabhan commented on MAPREDUCE-112:
-

run-test-mapred and contrib tests passed.
test patch returned +1 for all except Javac warnings where it complained the 
patch threw 1000 and odd more warnings than trunk --  evidently javac checking 
of  test patch is broken.

> Reduce Input Records and Reduce Output Records counters are not being set 
> when using the new Mapreduce reducer API
> --
>
> Key: MAPREDUCE-112
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-112
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1
>Reporter: Jothi Padmanabhan
>Assignee: Jothi Padmanabhan
>Priority: Blocker
> Fix For: 0.20.2
>
> Attachments: mapred-112-10Sep.patch
>
>
> After running the examples/wordcount (which uses the new API), the reduce 
> input and output record counters always show 0. This is because these 
> counters are not getting updated in the new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-112) Reduce Input Records and Reduce Output Records counters are not being set when using the new Mapreduce reducer API

2009-09-10 Thread Jothi Padmanabhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated MAPREDUCE-112:


Attachment: mapred-112-10Sep.patch

Straight Forward Patch

> Reduce Input Records and Reduce Output Records counters are not being set 
> when using the new Mapreduce reducer API
> --
>
> Key: MAPREDUCE-112
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-112
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1
>Reporter: Jothi Padmanabhan
>Assignee: Jothi Padmanabhan
>Priority: Blocker
> Fix For: 0.20.2
>
> Attachments: mapred-112-10Sep.patch
>
>
> After running the examples/wordcount (which uses the new API), the reduce 
> input and output record counters always show 0. This is because these 
> counters are not getting updated in the new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-112) Reduce Input Records and Reduce Output Records counters are not being set when using the new Mapreduce reducer API

2009-09-10 Thread Jothi Padmanabhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated MAPREDUCE-112:


Status: Patch Available  (was: Open)

> Reduce Input Records and Reduce Output Records counters are not being set 
> when using the new Mapreduce reducer API
> --
>
> Key: MAPREDUCE-112
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-112
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1
>Reporter: Jothi Padmanabhan
>Assignee: Jothi Padmanabhan
>Priority: Blocker
> Fix For: 0.20.2
>
> Attachments: mapred-112-10Sep.patch
>
>
> After running the examples/wordcount (which uses the new API), the reduce 
> input and output record counters always show 0. This is because these 
> counters are not getting updated in the new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-372) Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.

2009-09-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753659#action_12753659
 ] 

Arun C Murthy commented on MAPREDUCE-372:
-

Jothi, the patch you attached seems to be missing files... can you please 
upload a more complete one?

> Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.
> ---
>
> Key: MAPREDUCE-372
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-372
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: mapred-372.patch, patch-372-1.txt, patch-372.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

2009-09-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753647#action_12753647
 ] 

Arun C Murthy commented on MAPREDUCE-956:
-

I can see hte appeal of this, but we should remember that there are 
applications where merge is a significant part of the reduce runtime e.g. 
petasort's merge was _huge_.

> Shuffle should be broken down to only two phases (copy/reduce) instead of 
> three (copy/sort/reduce)
> --
>
> Key: MAPREDUCE-956
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Jothi Padmanabhan
>
> For the progress calculations and displaying on the UI, shuffle, in its 
> current form,  is decomposed into three phases (copy/sort/reduce). Actually, 
> the sort phase is no longer applicable. I think we should just reduce the 
> number of phases to two and assign 50% weight-age to each of copy and reduce 
> phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-777) A method for finding and tracking jobs from the new API

2009-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753626#action_12753626
 ] 

Hadoop QA commented on MAPREDUCE-777:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419169/patch-777-5.txt
  against trunk revision 813308.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 39 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 2292 javac compiler warnings (more 
than the trunk's current 2236 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/55/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/55/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/55/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/55/console

This message is automatically generated.

> A method for finding and tracking jobs from the new API
> ---
>
> Key: MAPREDUCE-777
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-777
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: client
>Reporter: Owen O'Malley
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: m-777.patch, patch-777-1.txt, patch-777-2.txt, 
> patch-777-3.txt, patch-777-4.txt, patch-777-5.txt, patch-777.txt
>
>
> We need to create a replacement interface for the JobClient API in the new 
> interface. In particular, the user needs to be able to query and track jobs 
> that were launched by other processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

2009-09-10 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753614#action_12753614
 ] 

Tom White commented on MAPREDUCE-956:
-

It's true that the merge occurs on the map side too. So this change sounds 
reasonable to me.

> Shuffle should be broken down to only two phases (copy/reduce) instead of 
> three (copy/sort/reduce)
> --
>
> Key: MAPREDUCE-956
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Jothi Padmanabhan
>
> For the progress calculations and displaying on the UI, shuffle, in its 
> current form,  is decomposed into three phases (copy/sort/reduce). Actually, 
> the sort phase is no longer applicable. I think we should just reduce the 
> number of phases to two and assign 50% weight-age to each of copy and reduce 
> phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-277) Job history counters should be avaible on the UI.

2009-09-10 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-277:
-

Attachment: mapred-277-v1.4.patch

Attaching a patch that fixes counter count. Result of test-patch
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


> Job history counters should be avaible on the UI.
> -
>
> Key: MAPREDUCE-277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-277
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Amareshwari Sriramadasu
>Assignee: Amar Kamat
> Attachments: HADOOP-3200-20080915.1.txt, mapred-277-v1.2.patch, 
> mapred-277-v1.4.patch
>
>
> Job history is logging counters. But they are not visible on the UI. 
> Job history parser and UI should be modified to view counters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-564) Provide a way for the client to get the number of currently running maps/reduces

2009-09-10 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-564:
---

Attachment: MR-564.v3.patch

Attaching new patch with testcase added.

Please review and provide your comments.

> Provide a way for the client to get the number of currently running 
> maps/reduces
> 
>
> Key: MAPREDUCE-564
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-564
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.21.0
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: MR-564.patch, MR-564.v1.patch, MR-564.v2.patch, 
> MR-564.v3.patch
>
>
> Add counters for Number of Succeeded Maps and Number of Succeeded Reduces so 
> that client can get this number without iterating through all the task 
> reports while the job is in progress.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-777) A method for finding and tracking jobs from the new API

2009-09-10 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-777:
--

Attachment: patch-777-5.txt

Patch fixes a couple of bugs in LocalJobRunner and JobTracker.

> A method for finding and tracking jobs from the new API
> ---
>
> Key: MAPREDUCE-777
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-777
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: client
>Reporter: Owen O'Malley
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: m-777.patch, patch-777-1.txt, patch-777-2.txt, 
> patch-777-3.txt, patch-777-4.txt, patch-777-5.txt, patch-777.txt
>
>
> We need to create a replacement interface for the JobClient API in the new 
> interface. In particular, the user needs to be able to query and track jobs 
> that were launched by other processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-777) A method for finding and tracking jobs from the new API

2009-09-10 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-777:
--

Status: Patch Available  (was: Open)

> A method for finding and tracking jobs from the new API
> ---
>
> Key: MAPREDUCE-777
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-777
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: client
>Reporter: Owen O'Malley
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: m-777.patch, patch-777-1.txt, patch-777-2.txt, 
> patch-777-3.txt, patch-777-4.txt, patch-777-5.txt, patch-777.txt
>
>
> We need to create a replacement interface for the JobClient API in the new 
> interface. In particular, the user needs to be able to query and track jobs 
> that were launched by other processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   >