[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758610#action_12758610
 ] 

Owen O'Malley commented on MAPREDUCE-1026:
--

1. Of course

2. I'm pretty agnostic what the authentication mechanism is, other than I don't 
want an extra round trip. I don't see any way of doing a hash without an extra 
round trip on the connection open. On the other hand, doing a password doesn't 
reveal anything that isn't already known. If the attacker can sniff the 
network, they already know the secret.

3. If there is a better key length, we can use it. 66^10 is big enough to be 
safe. 

4. Of course

5. The key is per a job of course, but there is no advantage to having the 
JobTracker pick it. Either way it will be framework code that picks it. Putting 
it in the job conf is easy, and secure (once MAPREDUCE-181 goes in). Given that 
the key will be at the JobTracker and all of the TaskTracker's, I don't see the 
submitting node as a problem.

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1027) jobtracker.jsp can have an html text block for announcements by admins.

2009-09-22 Thread Vinod K V (JIRA)
jobtracker.jsp can have an html text block for announcements by admins.
---

 Key: MAPREDUCE-1027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Vinod K V


jobtracker.jsp is the first page for users of Map/Reduce clusters and can be 
used for sending information across to all users. It will be useful to have a 
text block on this page where administrators can put any latest 
notices/announcements time to time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-666) Job scheduling information on jobtracker.jsp makes it clunky

2009-09-22 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-666:


Affects Version/s: 0.21.0
Fix Version/s: 0.21.0

I propose that we fix this in 0.21. On some of the clusters, the jobtracker.jsp 
gets really long when scheduling information per job is used.

> Job scheduling information on jobtracker.jsp makes it clunky
> 
>
> Key: MAPREDUCE-666
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-666
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.21.0
>Reporter: Vinod K V
> Fix For: 0.21.0
>
>
> Job scheduling information is displayed for each job on the jobtracker.jsp 
> along with many other details. Though it is empty by default, when it is in 
> use, for e.g. with high memory jobs in capacity scheduler, the UI looks 
> clunky with long strings of job scheduling information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1000) JobHistory.initDone() should retain the try ... catch in the body

2009-09-22 Thread Jothi Padmanabhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan reassigned MAPREDUCE-1000:


Assignee: Jothi Padmanabhan

> JobHistory.initDone() should retain the try ... catch in the body
> -
>
> Key: MAPREDUCE-1000
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1000
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Hong Tang
>Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-587) Stream test TestStreamingExitStatus fails with Out of Memory

2009-09-22 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-587:
-

Release Note: Reduced the io.sort.mb in TestStreamingExitStatus to prevent 
OOM.

> Stream test TestStreamingExitStatus fails with Out of Memory
> 
>
> Key: MAPREDUCE-587
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-587
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
> Environment: OS/X, 64-bit x86 imac, 4GB RAM.
>Reporter: Steve Loughran
>Assignee: Amar Kamat
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-587-v1.0.patch
>
>
> contrib/streaming tests are failing a test with an Out of Memory error on an 
> OS/X Mac -same problem does not surface on Linux.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-884) TestReduceFetchFromPartialMem fails sometimes

2009-09-22 Thread Jothi Padmanabhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated MAPREDUCE-884:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Fixed as a part of MR-318

> TestReduceFetchFromPartialMem fails sometimes
> -
>
> Key: MAPREDUCE-884
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-884
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.1
>Reporter: Amar Kamat
>Assignee: Jothi Padmanabhan
> Fix For: 0.21.0
>
> Attachments: mapred-884.patch
>
>
> TestReduceFetchFromPartialMem failed with the following exception trace :
> {code}
> Expected some records not spilled during reduce40980)
> junit.framework.AssertionFailedError: Expected some records not spilled 
> during reduce40980)
> at 
> org.apache.hadoop.mapred.TestReduceFetchFromPartialMem.testReduceFromPartialMem(TestReduceFetchFromPartialMem.java:94)
> at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
> at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
> at junit.extensions.TestSetup.run(TestSetup.java:27)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-975) Add an API in job client to get the history file url for a given job id

2009-09-22 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-975:
-

Release Note: Adds an API Cluster#getJobHistoryUrl(JobID jobId) to get the 
history url for a given job id. The API does not check for the validity of job 
id or existence of the history file. It just constructs the url based on 
history folder, job  id and the current user.

> Add an API in job client to get the history file url for a given job id
> ---
>
> Key: MAPREDUCE-975
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-975
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: client, jobtracker
>Reporter: Sharad Agarwal
>Assignee: Sharad Agarwal
> Fix For: 0.21.0
>
> Attachments: 975_v1.patch, 975_v2.patch, 975_v3.patch
>
>
> MAPREDUCE-817 added an API to get history url in RunningJob. Similar API 
> should be added in job client to get the history file given a job id. 
> Something like:
> String getHistoryFile(JobId jobid);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-760) TestNodeRefresh might not work as expected

2009-09-22 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-760:
-

Release Note: TestNodeRefresh waits for the newly added tracker to join 
before starting the testing.

> TestNodeRefresh might not work as expected
> --
>
> Key: MAPREDUCE-760
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-760
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-760-v1.0.patch
>
>
> MAPREDUCE-677 fixed one part of the problem. It is possible that the 
> tasktracker might not have joined the jobtracker and hence the asserts might 
> fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-848) TestCapacityScheduler is failing

2009-09-22 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-848:
-

Release Note: MAPREDUCE-805 changed the way the job was initialized. 
Capacity schedulers testcases were not modified as part of MAPREDUCE-805. This 
patch fixes this bug.

> TestCapacityScheduler is failing
> 
>
> Key: MAPREDUCE-848
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-848
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/capacity-sched
>Affects Versions: 0.21.0
>Reporter: Devaraj Das
>Assignee: Amar Kamat
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-848-v1.0.patch
>
>
> Looks like the commit of HADOOP-805 broke the CapacityScheduler testcase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-943) TestNodeRefresh timesout occasionally

2009-09-22 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-943:
-

Release Note: TestNodeRefresh timed out as the code to do with node refresh 
got removed. This patch removes the testcase.

> TestNodeRefresh timesout occasionally
> -
>
> Key: MAPREDUCE-943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-943
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: jobtracker
>Reporter: Amareshwari Sriramadasu
>Assignee: Amar Kamat
> Fix For: 0.21.0
>
> Attachments: MAPRED-943-v1.0.patch
>
>
> TestNodeRefresh timesout occasionally.
> One of the hudson patch build with timeout 
> @http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/26/testReport/org.apache.hadoop.mapred/TestNodeRefresh/testMRExcludeHostsAcrossRestarts/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-189) Change Map-Reduce framework to use JAAS instead of UGI

2009-09-22 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-189:


Issue Type: Sub-task  (was: Bug)
Parent: MAPREDUCE-563

> Change Map-Reduce framework to use JAAS instead of UGI
> --
>
> Key: MAPREDUCE-189
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-189
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> Hadoop embraced JAAS via HADOOP-4348.
> We need to fix Map-Reduce to use JAAS concepts such as Subject, Principal, 
> Permission etc. rather than UserGroupInformation for user identification, 
> queue-acls etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-181) Secure job submission

2009-09-22 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-181:


Issue Type: Sub-task  (was: Bug)
Parent: MAPREDUCE-563

> Secure job submission 
> --
>
> Key: MAPREDUCE-181
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Attachments: hadoop-3578-branch-20-example-2.patch, 
> hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
> HADOOP-3578-v2.7.patch, MAPRED-181-v3.8.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1026:
-

Issue Type: Sub-task  (was: Improvement)
Parent: MAPREDUCE-563

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-728) Mumak: Map-Reduce Simulator

2009-09-22 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-728:


Status: Open  (was: Patch Available)

> Mumak: Map-Reduce Simulator
> ---
>
> Key: MAPREDUCE-728
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-728
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.21.0
>Reporter: Arun C Murthy
>Assignee: Hong Tang
> Fix For: 0.21.0
>
> Attachments: 19-jobs.topology.json.gz, 19-jobs.trace.json.gz, 
> mapreduce-728-20090917-3.patch, mapreduce-728-20090917-4.patch, 
> mapreduce-728-20090917.patch, mapreduce-728-20090918-2.patch, 
> mapreduce-728-20090918-3.patch, mapreduce-728-20090918-5.patch, 
> mapreduce-728-20090918-6.patch, mapreduce-728-20090918.patch, mumak.png
>
>
> h3. Vision:
> We want to build a Simulator to simulate large-scale Hadoop clusters, 
> applications and workloads. This would be invaluable in furthering Hadoop by 
> providing a tool for researchers and developers to prototype features (e.g. 
> pluggable block-placement for HDFS, Map-Reduce schedulers etc.) and predict 
> their behaviour and performance with reasonable amount of confidence, 
> there-by aiding rapid innovation.
> 
> h3. First Cut: Simulator for the Map-Reduce Scheduler
> The Map-Reduce Scheduler is a fertile area of interest with at least four 
> schedulers, each with their own set of features, currently in existence: 
> Default Scheduler, Capacity Scheduler, Fairshare Scheduler & Priority 
> Scheduler.
> Each scheduler's scheduling decisions are driven by many factors, such as 
> fairness, capacity guarantee, resource availability, data-locality etc.
> Given that, it is non-trivial to accurately choose a single scheduler or even 
> a set of desired features to predict the right scheduler (or features) for a 
> given workload. Hence a simulator which can predict how well a particular 
> scheduler works for some specific workload by quickly iterating over 
> schedulers and/or scheduler features would be quite useful.
> So, the first cut is to implement a simulator for the Map-Reduce scheduler 
> which take as input a job trace derived from production workload and a 
> cluster definition, and simulates the execution of the jobs in as defined in 
> the trace in this virtual cluster. As output, the detailed job execution 
> trace (recorded in relation to virtual simulated time) could then be analyzed 
> to understand various traits of individual schedulers (individual jobs turn 
> around time, throughput, faireness, capacity guarantee, etc). To support 
> this, we would need a simulator which could accurately model the conditions 
> of the actual system which would affect a schedulers decisions. These include 
> very large-scale clusters (thousands of nodes), the detailed characteristics 
> of the workload thrown at the clusters, job or task failures, data locality, 
> and cluster hardware (cpu, memory, disk i/o, network i/o, network topology) 
> etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-728) Mumak: Map-Reduce Simulator

2009-09-22 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-728:


Fix Version/s: 0.22.0
   Status: Patch Available  (was: Open)

test-patch passed on both trunk and hadoop-0.21 branch on my local machine.

> Mumak: Map-Reduce Simulator
> ---
>
> Key: MAPREDUCE-728
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-728
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.21.0
>Reporter: Arun C Murthy
>Assignee: Hong Tang
> Fix For: 0.21.0, 0.22.0
>
> Attachments: 19-jobs.topology.json.gz, 19-jobs.trace.json.gz, 
> mapreduce-728-20090917-3.patch, mapreduce-728-20090917-4.patch, 
> mapreduce-728-20090917.patch, mapreduce-728-20090918-2.patch, 
> mapreduce-728-20090918-3.patch, mapreduce-728-20090918-5.patch, 
> mapreduce-728-20090918-6.patch, mapreduce-728-20090918.patch, mumak.png
>
>
> h3. Vision:
> We want to build a Simulator to simulate large-scale Hadoop clusters, 
> applications and workloads. This would be invaluable in furthering Hadoop by 
> providing a tool for researchers and developers to prototype features (e.g. 
> pluggable block-placement for HDFS, Map-Reduce schedulers etc.) and predict 
> their behaviour and performance with reasonable amount of confidence, 
> there-by aiding rapid innovation.
> 
> h3. First Cut: Simulator for the Map-Reduce Scheduler
> The Map-Reduce Scheduler is a fertile area of interest with at least four 
> schedulers, each with their own set of features, currently in existence: 
> Default Scheduler, Capacity Scheduler, Fairshare Scheduler & Priority 
> Scheduler.
> Each scheduler's scheduling decisions are driven by many factors, such as 
> fairness, capacity guarantee, resource availability, data-locality etc.
> Given that, it is non-trivial to accurately choose a single scheduler or even 
> a set of desired features to predict the right scheduler (or features) for a 
> given workload. Hence a simulator which can predict how well a particular 
> scheduler works for some specific workload by quickly iterating over 
> schedulers and/or scheduler features would be quite useful.
> So, the first cut is to implement a simulator for the Map-Reduce scheduler 
> which take as input a job trace derived from production workload and a 
> cluster definition, and simulates the execution of the jobs in as defined in 
> the trace in this virtual cluster. As output, the detailed job execution 
> trace (recorded in relation to virtual simulated time) could then be analyzed 
> to understand various traits of individual schedulers (individual jobs turn 
> around time, throughput, faireness, capacity guarantee, etc). To support 
> this, we would need a simulator which could accurately model the conditions 
> of the actual system which would affect a schedulers decisions. These include 
> very large-scale clusters (thousands of nodes), the detailed characteristics 
> of the workload thrown at the clusters, job or task failures, data locality, 
> and cluster hardware (cpu, memory, disk i/o, network i/o, network topology) 
> etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1022) Trunk tests fail because of test-failure in Vertica

2009-09-22 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758562#action_12758562
 ] 

Vinod K V commented on MAPREDUCE-1022:
--

bq. Sorry if I'm missing some context here, but I'm wondering how this class 
get into a different package and why that hasn't been caught before? 
This was caused by MAPREDUCE-775. The contributor made it Patch Available and 
Hudson blessed it on Sep 13 and the patch got committed on Sep 19 without it 
going through Hudson again. Mean while, MAPREDUCE-777 went in breaking this 
patch. Ideally MAPREDUCE-775 should have been run through Hudson again, but 
this case it wasn't done (perhaps due to the Feature Freeze date pressure, 
perhaps).

> Trunk tests fail because of test-failure in Vertica
> ---
>
> Key: MAPREDUCE-1022
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1022
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Vinod K V
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1022.20090922.txt
>
>
> ant test fails with
> {code}
> [javac] 
> /home/vinodkv/Workspace/eclipse-workspace/hadoop-mapreduce/src/contrib/vertica/src/test/org/apache/hadoop/vertica/TestVertica.java:43:
>  cannot find symbol
> [javac] symbol  : class JobContextImpl
> [javac] location: package org.apache.hadoop.mapreduce
> [javac] import org.apache.hadoop.mapreduce.JobContextImpl;
> [javac]   ^
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758545#action_12758545
 ] 

Kan Zhang commented on MAPREDUCE-1026:
--

I had some rough idea for this when I opened HADOOP-4991. Briefly,
1. The output of Map tasks of a job should be accessed only by Reduce tasks of 
the same job.
2. Since currently this access is done over HTTP, I suggest we use HTTP DIGEST 
authentication mechanism as defined in RFC 2617. This is better than HTTP BASIC 
authentication since in the case of HTTP DIGEST, the secret key is never sent 
over to the server in the clear and it allows for mutual authentication.
3. We should use whatever key length that is recommended by the standard and 
JCE implementation.
4. The key is per-job and should be chosen by the JobTracker at job submission 
and persisted in the job conf in such a way that only tasks of that job + TT/JT 
can access it. I favor chosen by JT over chosen by JobClient for 2 reasons.
- The key is considered an internal detail of the M/R framework and should be 
transparent to anyone outside the M/R cluster, including the JobClient.
- You don't need to worry about the key being accidentally disclosed 
before/after being submitted to the JT at the client site.

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-664) distcp with -delete option does not display number of files deleted from the target that were not present on source

2009-09-22 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758542#action_12758542
 ] 

Chris Douglas commented on MAPREDUCE-664:
-

bq. I assume you meant MAPREDUCE-1008. Not HADOOP-1008.

Yes; thanks for the correction

> distcp with -delete option does not display number of files deleted from the 
> target that were not present on source 
> 
>
> Key: MAPREDUCE-664
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-664
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Suhas Gogate
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: d_deletedPathsCount.patch, d_deletedPathsCount664.patch
>
>
> distcp with -delete option should provide information on total number of 
> files deleted from the target that were not present on the source. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758493#action_12758493
 ] 

Owen O'Malley commented on MAPREDUCE-1026:
--

I just wanted to get a proposal out there. 66^10 is very big. It is roughly 
2^60.

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758487#action_12758487
 ] 

Owen O'Malley commented on MAPREDUCE-1026:
--

Avro RPC won't have bulk data or authentication for a while, I suspect.

But the answer is yes, once there is authentication on the rpc, we can use 
that. In particular, the rpc will be able to use token/secret keys for 
authentication and that would be appropriate for this context. (Clearly a key 
exchange involving the kdc would never be performant enough for the shuffle.)

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758485#action_12758485
 ] 

Allen Wittenauer commented on MAPREDUCE-1026:
-

> 10 characters from [a-zA-Z0-9]

This seems like a fairly small key space that one could Hadoop on a small 
cluster to break. :)  Why not just use MD5 or SHA1 128 or 256 bit keys?

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758477#action_12758477
 ] 

Jeff Hammerbacher commented on MAPREDUCE-1026:
--

Hey Owen (and probably Doug),

While we're here: how would this strategy change if map output was transferred 
to the reducers using Avro's RPC? Is there authentication in the handshake, and 
encryption (ssl?) for the data?

Just trying to educate myself for The Future (tm).

Thanks,
Jeff

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1016) Make the format of the Job History be JSON instead of Avro binary

2009-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758457#action_12758457
 ] 

Hadoop QA commented on MAPREDUCE-1016:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420284/MAPREDUCE-1016.patch
  against trunk revision 817740.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/123/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/123/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/123/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/123/console

This message is automatically generated.

> Make the format of the Job History be JSON instead of Avro binary
> -
>
> Key: MAPREDUCE-1016
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1016
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Doug Cutting
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1016.patch
>
>
> I forgot that one of the features that would be nice is to off load the job 
> history display from the JobTracker. That will be a lot easier, if the job 
> history is stored in JSON. Therefore, I think we should change the storage 
> now to prevent incompatibilities later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758452#action_12758452
 ] 

Owen O'Malley commented on MAPREDUCE-1026:
--

The JobClient should create a random key of 10 characters from [a-zA-Z0-9] and 
put it in the job conf as secret.mapred.job.shuffle.key. I'd propose that we 
add all secret keys in a sub-tree of the config key space (secret.*) so that 
the web ui can hide them. The reducer can include the key in the url and the 
TaskTracker can check to make sure it is correct.

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Owen O'Malley (JIRA)
Shuffle should be secure


 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security
Reporter: Owen O'Malley
Assignee: Devaraj Das


Since the user's data is available via http from the TaskTrackers, we should 
require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-664) distcp with -delete option does not display number of files deleted from the target that were not present on source

2009-09-22 Thread gary murry (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758405#action_12758405
 ] 

gary murry commented on MAPREDUCE-664:
--

I assume you meant MAPREDUCE-1008.  Not HADOOP-1008. :-)

> distcp with -delete option does not display number of files deleted from the 
> target that were not present on source 
> 
>
> Key: MAPREDUCE-664
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-664
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.20.1
>Reporter: Suhas Gogate
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: d_deletedPathsCount.patch, d_deletedPathsCount664.patch
>
>
> distcp with -delete option should provide information on total number of 
> files deleted from the target that were not present on the source. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

2009-09-22 Thread gary murry (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758404#action_12758404
 ] 

gary murry commented on MAPREDUCE-971:
--

Cool, thanks for the additional info.

> distcp does not always remove distcp.tmp.dir
> 
>
> Key: MAPREDUCE-971
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-645) When disctp is used to overwrite a file, it should return immediately with an error message

2009-09-22 Thread gary murry (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758401#action_12758401
 ] 

gary murry commented on MAPREDUCE-645:
--

Can we get a note about why no new unit tests were added?  Thanks

> When disctp is used to overwrite a file, it should return immediately with an 
> error message
> ---
>
> Key: MAPREDUCE-645
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-645
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Reporter: Ramya R
>Assignee: Ravi Gummadi
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: d_645.patch, d_645_v1.patch, distcp.txt
>
>
> When disctp is triggered to copy a directory to an already existing file, it 
> just shows a "copy failed" error message after 4 attempts without showing any 
> useful error message. This is extremely time consuming on a large cluster and 
> especially when the directory being copied contains several sub-directories.
> Instead, it would be an improvement if distcp could return immediately 
> displaying a useful error message when an user attempts such an operation. 
> (This is an unlikely situation but still a valid test case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

2009-09-22 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758400#action_12758400
 ] 

Aaron Kimball commented on MAPREDUCE-971:
-

An automated unit test for an S3-based system would require hardcoding S3 
access credentials and connecting to an S3 account (which is a for-pay 
resource).

> distcp does not always remove distcp.tmp.dir
> 
>
> Key: MAPREDUCE-971
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-649) distcp should validate the data copied

2009-09-22 Thread gary murry (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758398#action_12758398
 ] 

gary murry commented on MAPREDUCE-649:
--

Since there are no unit tests, how was this tested?  What manual scenarios were 
ran?

> distcp should validate the data copied
> --
>
> Key: MAPREDUCE-649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-649
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: d_verify.patch, d_verify649.patch, d_verify649.v1.patch
>
>
> distcp should validate the files copied by checking the checksums, if the 
> filesystem supports checksums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

2009-09-22 Thread gary murry (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758393#action_12758393
 ] 

gary murry commented on MAPREDUCE-971:
--

It is good that this tested manually andit is appriciated that the manual test 
was outline here.  But why was no unit test added so that the fix can be 
verified automatically on future builds? 

> distcp does not always remove distcp.tmp.dir
> 
>
> Key: MAPREDUCE-971
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-318) Refactor reduce shuffle code

2009-09-22 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758371#action_12758371
 ] 

Scott Carey commented on MAPREDUCE-318:
---

In addition to a quick code review of the bits I was interested in related to 
fetching map output fragments, I did a quick and dirty test on trunk on a tiny 
cluster  to make sure that this change had the same effect as the one-line fix 
I apply to 0.19.2 on production for similar benefits.  See my comment from June 
10 2009.  The old code was artificially throttling the shuffle to one output 
file per TT per ping-cycle.

Quite simply, any fix that lets a reducer fetch all the complete map outputs it 
finds in one ping-cycle helps those jobs with map output counts much greater 
than node count.  One line hack or full refactor.  

The impact really depends on the cluster config and job type... ours is new 
hardware with plenty of RAM per node which leads to using ~11 + concurrent map 
tasks per node and a larger ratio of map shards per reduce to task trackers.  
The bigger that ratio, the bigger the impact of optimized shuffle fetching.

> Refactor reduce shuffle code
> 
>
> Key: MAPREDUCE-318
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-318
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.21.0
>
> Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, 
> mapred-318-14Aug.patch, mapred-318-20Aug.patch, mapred-318-24Aug.patch, 
> mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch, mapred-318-common.patch
>
>
> The reduce shuffle code has become very complex and entangled. I think we 
> should move it out of ReduceTask and into a separate package 
> (org.apache.hadoop.mapred.task.reduce). Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1016) Make the format of the Job History be JSON instead of Avro binary

2009-09-22 Thread Doug Cutting (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated MAPREDUCE-1016:


Status: Patch Available  (was: Open)

> Make the format of the Job History be JSON instead of Avro binary
> -
>
> Key: MAPREDUCE-1016
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1016
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Doug Cutting
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1016.patch
>
>
> I forgot that one of the features that would be nice is to off load the job 
> history display from the JobTracker. That will be a lot easier, if the job 
> history is stored in JSON. Therefore, I think we should change the storage 
> now to prevent incompatibilities later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1014) After the 0.21 branch, MapReduce trunk doesn't compile

2009-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758329#action_12758329
 ] 

Hudson commented on MAPREDUCE-1014:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #60 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/60/])
. Fix the libraries for common and hdfs. (omalley)


> After the 0.21 branch, MapReduce trunk doesn't compile
> --
>
> Key: MAPREDUCE-1014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1014
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Devaraj Das
>Assignee: Ravi Gummadi
>Priority: Blocker
> Fix For: 0.22.0
>
>
> When ant is run, the build fails with compilation problems. The first of that 
> is:
> compile-mapred-classes:
>   [taskdef] log4j:ERROR Could not instantiate class 
> [org.apache.hadoop.metrics.jvm.EventCounter].
>   [taskdef] java.lang.ClassNotFoundException: 
> org.apache.hadoop.metrics.jvm.EventCounter
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.findClassInComponents(AntClassLoader.java:1383)
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.findClass(AntClassLoader.java:1324)
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.loadClass(AntClassLoader.java:1072)
>   [taskdef] at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
>   [taskdef] at 
> java.lang.ClassLoader.loadClassInternal(ClassLoader.java:402)
>   [taskdef] at java.lang.Class.forName0(Native Method)
>   [taskdef] at java.lang.Class.forName(Class.java:169)
>   [taskdef] at org.apache.log4j.helpers.Loader.loadClass(Loader.java:179)
>   [taskdef] at 
> org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:320)
>   [taskdef] at 
> org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:121)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:664)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:476)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-318) Refactor reduce shuffle code

2009-09-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758320#action_12758320
 ] 

Todd Lipcon commented on MAPREDUCE-318:
---

Scott: are you running this on the 20 branch or using 21/trunk? If you have a 
patch rebased onto 20 that you've been tested I'd be interested in taking a 
look.

> Refactor reduce shuffle code
> 
>
> Key: MAPREDUCE-318
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-318
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.21.0
>
> Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, 
> mapred-318-14Aug.patch, mapred-318-20Aug.patch, mapred-318-24Aug.patch, 
> mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch, mapred-318-common.patch
>
>
> The reduce shuffle code has become very complex and entangled. I think we 
> should move it out of ReduceTask and into a separate package 
> (org.apache.hadoop.mapred.task.reduce). Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-318) Refactor reduce shuffle code

2009-09-22 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758317#action_12758317
 ] 

Arun C Murthy commented on MAPREDUCE-318:
-

Thanks for sharing that with us Scott... we aim to please! *smile*

> Refactor reduce shuffle code
> 
>
> Key: MAPREDUCE-318
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-318
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.21.0
>
> Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, 
> mapred-318-14Aug.patch, mapred-318-20Aug.patch, mapred-318-24Aug.patch, 
> mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch, mapred-318-common.patch
>
>
> The reduce shuffle code has become very complex and entangled. I think we 
> should move it out of ReduceTask and into a separate package 
> (org.apache.hadoop.mapred.task.reduce). Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1022) Trunk tests fail because of test-failure in Vertica

2009-09-22 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758316#action_12758316
 ] 

Konstantin Boudnik commented on MAPREDUCE-1022:
---

Sorry if I'm missing some context here, but I'm wondering how this class get 
into a different package and why that hasn't been caught before?

> Trunk tests fail because of test-failure in Vertica
> ---
>
> Key: MAPREDUCE-1022
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1022
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Vinod K V
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1022.20090922.txt
>
>
> ant test fails with
> {code}
> [javac] 
> /home/vinodkv/Workspace/eclipse-workspace/hadoop-mapreduce/src/contrib/vertica/src/test/org/apache/hadoop/vertica/TestVertica.java:43:
>  cannot find symbol
> [javac] symbol  : class JobContextImpl
> [javac] location: package org.apache.hadoop.mapreduce
> [javac] import org.apache.hadoop.mapreduce.JobContextImpl;
> [javac]   ^
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-318) Refactor reduce shuffle code

2009-09-22 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758314#action_12758314
 ] 

Scott Carey commented on MAPREDUCE-318:
---

You may also want to note that this change improves performance significantly 
in some cases, especially when there is a large number of small to medium sized 
map outputs  (many more outputs to fetch per reduce than the number of 
TaskTrackers).  
In some of my jobs, shuffle times have dropped from 60% of the job time to < 
5%. 

For a given job, shuffle time is FAR less sensitive to the number of maps and 
reducers than it was before. 

> Refactor reduce shuffle code
> 
>
> Key: MAPREDUCE-318
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-318
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.21.0
>
> Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, 
> mapred-318-14Aug.patch, mapred-318-20Aug.patch, mapred-318-24Aug.patch, 
> mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch, mapred-318-common.patch
>
>
> The reduce shuffle code has become very complex and entangled. I think we 
> should move it out of ReduceTask and into a separate package 
> (org.apache.hadoop.mapred.task.reduce). Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1014) After the 0.21 branch, MapReduce trunk doesn't compile

2009-09-22 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved MAPREDUCE-1014.
--

Resolution: Fixed

I updated the common and hdfs jars with the current ones.

> After the 0.21 branch, MapReduce trunk doesn't compile
> --
>
> Key: MAPREDUCE-1014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1014
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Devaraj Das
>Assignee: Ravi Gummadi
>Priority: Blocker
> Fix For: 0.22.0
>
>
> When ant is run, the build fails with compilation problems. The first of that 
> is:
> compile-mapred-classes:
>   [taskdef] log4j:ERROR Could not instantiate class 
> [org.apache.hadoop.metrics.jvm.EventCounter].
>   [taskdef] java.lang.ClassNotFoundException: 
> org.apache.hadoop.metrics.jvm.EventCounter
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.findClassInComponents(AntClassLoader.java:1383)
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.findClass(AntClassLoader.java:1324)
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.loadClass(AntClassLoader.java:1072)
>   [taskdef] at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
>   [taskdef] at 
> java.lang.ClassLoader.loadClassInternal(ClassLoader.java:402)
>   [taskdef] at java.lang.Class.forName0(Native Method)
>   [taskdef] at java.lang.Class.forName(Class.java:169)
>   [taskdef] at org.apache.log4j.helpers.Loader.loadClass(Loader.java:179)
>   [taskdef] at 
> org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:320)
>   [taskdef] at 
> org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:121)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:664)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:476)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-679) XML-based metrics as JSP servlet for JobTracker

2009-09-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758310#action_12758310
 ] 

Todd Lipcon commented on MAPREDUCE-679:
---

bq. I can think of ways in which we take the XML output as the standard and 
generate html from it (ala forrest)

The standard way to do this is to use XSLT. Unfortunately, most everyone I've 
talked to who has used XSLT for generating web pages has decided it's a giant 
pain and wished they hadn't :)

> XML-based metrics as JSP servlet for JobTracker
> ---
>
> Key: MAPREDUCE-679
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-679
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: jobtracker
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Fix For: 0.21.0
>
> Attachments: example-jobtracker-completed-job.xml, 
> example-jobtracker-running-job.xml, MAPREDUCE-679.2.patch, 
> MAPREDUCE-679.3.patch, MAPREDUCE-679.4.patch, MAPREDUCE-679.5.patch, 
> MAPREDUCE-679.6.patch, MAPREDUCE-679.7.patch, MAPREDUCE-679.patch
>
>
> In HADOOP-4559, a general REST API for reporting metrics was proposed but 
> work seems to have stalled. In the interim, we have a simple XML translation 
> of the existing JobTracker status page which provides the same metrics 
> (including the tables of running/completed/failed jobs) as the human-readable 
> page. This is a relatively lightweight addition to provide some 
> machine-understandable metrics reporting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-679) XML-based metrics as JSP servlet for JobTracker

2009-09-22 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758305#action_12758305
 ] 

Aaron Kimball commented on MAPREDUCE-679:
-

The "most correct" way to do this would be to integrate with some Java web 
framework's templating engine, which would, given a set of key/value data, 
either provide HTML or XML or JSON or a dozen other formats from the same 
source. But that is a big change and not something that's just going to come 
together overnight. Generating HTML from XML is a reasonable-sounding 
alternative.

For now, it's on us to add fields to the JSPX when doing so to the JSP, or vice 
versa. Suboptimal, for sure.


> XML-based metrics as JSP servlet for JobTracker
> ---
>
> Key: MAPREDUCE-679
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-679
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: jobtracker
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Fix For: 0.21.0
>
> Attachments: example-jobtracker-completed-job.xml, 
> example-jobtracker-running-job.xml, MAPREDUCE-679.2.patch, 
> MAPREDUCE-679.3.patch, MAPREDUCE-679.4.patch, MAPREDUCE-679.5.patch, 
> MAPREDUCE-679.6.patch, MAPREDUCE-679.7.patch, MAPREDUCE-679.patch
>
>
> In HADOOP-4559, a general REST API for reporting metrics was proposed but 
> work seems to have stalled. In the interim, we have a simple XML translation 
> of the existing JobTracker status page which provides the same metrics 
> (including the tables of running/completed/failed jobs) as the human-readable 
> page. This is a relatively lightweight addition to provide some 
> machine-understandable metrics reporting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1016) Make the format of the Job History be JSON instead of Avro binary

2009-09-22 Thread Doug Cutting (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated MAPREDUCE-1016:


Attachment: MAPREDUCE-1016.patch

Here's a patch that implements this.


> Make the format of the Job History be JSON instead of Avro binary
> -
>
> Key: MAPREDUCE-1016
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1016
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Doug Cutting
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1016.patch
>
>
> I forgot that one of the features that would be nice is to off load the job 
> history display from the JobTracker. That will be a lot easier, if the job 
> history is stored in JSON. Therefore, I think we should change the storage 
> now to prevent incompatibilities later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1023) Newly introduced findBugs warnings should be suppressed

2009-09-22 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758210#action_12758210
 ] 

Vinod K V commented on MAPREDUCE-1023:
--

I couldn't locate the project history because of the project split. Well, at 
any rate, I guess that the problem at that time is people didn't realize what 
to do when an un-ignorable warning is spelt out.

I think now, we should bring the number back to zero, advertise the fact that 
src/test/findbugsExcludeFile.xml serves this purpose, perhaps via the Hudson's 
reports, and be careful and persistent in making patches follow this. Opinions?

> Newly introduced findBugs warnings should be suppressed
> ---
>
> Key: MAPREDUCE-1023
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1023
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Vinod K V
> Fix For: 0.21.0
>
>
> FindBugs warnings introduced by MAPREDUCE-711 and HADOOP-6230 should be 
> suppressed by modifying src/test/findbugsExcludeFile.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-893) Provide an ability to refresh queue configuration without restart.

2009-09-22 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-893:


Release Note: Extended the framework's refresh-queue mechanism to support 
refresh of scheduler specific queue properties and implemented this refresh 
operation for some of the capacity scheduler properties. With this feature, one 
can refresh some of the capacity-scheduler's queue related properties - queue 
capacities, user-limits per queue, max map/reduce capacity and max-jobs per 
user to initialize while the system is running and without restarting JT. Even 
after this, some features like changing enable/disable priorities, 
adding/removing queues are not supported in capacity-scheduler.  (was: Extended 
the framework to refresh queue properties to support refresh of scheduler 
properties.
Implemented the refresh operation for capacity scheduler properties like queue 
capacities.)

> Provide an ability to refresh queue configuration without restart.
> --
>
> Key: MAPREDUCE-893
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-893
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Reporter: Hemanth Yamijala
>Assignee: Vinod K V
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-893-20090915.1.txt, 
> MAPREDUCE-893-20090917.2.txt, MAPREDUCE-893-20090917.4.txt, 
> MAPREDUCE-893-20090918.2.txt, MAPREDUCE-893-20090918.over.849.patch, 
> MAPREDUCE-893-20090918.txt, MAPREDUCE-893-7.patch
>
>
> While administering a cluster using multiple queues, administrators feel a 
> need to refresh queue properties on the fly without needing to restart the 
> JobTracker. This is partially supported for some properties such as queue 
> ACLs (HADOOP-5396) and state (HADOOP-5913). The idea is to extend the 
> facility to refresh other queue properties as well, including scheduler 
> properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-679) XML-based metrics as JSP servlet for JobTracker

2009-09-22 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758199#action_12758199
 ] 

Vinod K V commented on MAPREDUCE-679:
-

I really like the fact that something real is finally done with respect to 
testing information presented via JSPs. Thanks Aaron!

But I still have one doubt - how do we keep the jsp and jspx pages in sync? I 
can think of ways in which we take the XML output as the standard and generate 
html from it (ala forrest). I can already see this kind of duplication across 
html/xml in this patch, generateRetiredJobTable() and generateRetiredJobXml() 
in JSPUtil.java for example. Thoughts about this?

> XML-based metrics as JSP servlet for JobTracker
> ---
>
> Key: MAPREDUCE-679
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-679
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: jobtracker
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Fix For: 0.21.0
>
> Attachments: example-jobtracker-completed-job.xml, 
> example-jobtracker-running-job.xml, MAPREDUCE-679.2.patch, 
> MAPREDUCE-679.3.patch, MAPREDUCE-679.4.patch, MAPREDUCE-679.5.patch, 
> MAPREDUCE-679.6.patch, MAPREDUCE-679.7.patch, MAPREDUCE-679.patch
>
>
> In HADOOP-4559, a general REST API for reporting metrics was proposed but 
> work seems to have stalled. In the interim, we have a simple XML translation 
> of the existing JobTracker status page which provides the same metrics 
> (including the tables of running/completed/failed jobs) as the human-readable 
> page. This is a relatively lightweight addition to provide some 
> machine-understandable metrics reporting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1023) Newly introduced findBugs warnings should be suppressed

2009-09-22 Thread Jothi Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758196#action_12758196
 ] 

Jothi Padmanabhan commented on MAPREDUCE-1023:
--

bq. How about modifying the Hudson test-patch.sh script to scream when the 
warnings go above zero

Well,  after findbugs hit 0 with trunk, test-patch.sh would have screamed even 
when these new findbugs got introduced as the difference between patch and 
trunk would have been greater than zero. But then, people seem to have ignored 
the sreams!!

> Newly introduced findBugs warnings should be suppressed
> ---
>
> Key: MAPREDUCE-1023
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1023
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Vinod K V
> Fix For: 0.21.0
>
>
> FindBugs warnings introduced by MAPREDUCE-711 and HADOOP-6230 should be 
> suppressed by modifying src/test/findbugsExcludeFile.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1025) Subprocesses of tasks should be killed even if mapred.userlog.limit.kb is set to a positive value

2009-09-22 Thread Ravi Gummadi (JIRA)
Subprocesses of tasks should be killed even if mapred.userlog.limit.kb is set 
to a positive value
-

 Key: MAPREDUCE-1025
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1025
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.21.0, 0.22.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.21.0, 0.22.0


setsid is to be used when launching tasks even when the deprecated 
mapred.userlog.limit.kb(or the new mapreduce.task.userlog.limit.kb) is set to a 
positive value --- sothat subprocesses of tasks get killed properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1023) Newly introduced findBugs warnings should be suppressed

2009-09-22 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758195#action_12758195
 ] 

Vinod K V commented on MAPREDUCE-1023:
--

In fact, why don't we just suppress everything that newly appeared after 
HADOOP-5661 went in?

Via HADOOP-5661, Jothi went through pains to make sure findBugs warnings become 
zero. All that effort would be a waste if patches keep ignoring these warnings. 
How about modifying the Hudson test-patch.sh script to scream when the warnings 
go above zero level and making sure trunk is always at zero findBugs warnings? 
As noted by others, this would also speed up the test-patch.sh process as we 
will just to need to find the number of warnings introduced by the patch.

I am not very sure whether this should be fixed as bug/blocker for 0.21. 
Thoughts?

> Newly introduced findBugs warnings should be suppressed
> ---
>
> Key: MAPREDUCE-1023
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1023
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Vinod K V
> Fix For: 0.21.0
>
>
> FindBugs warnings introduced by MAPREDUCE-711 and HADOOP-6230 should be 
> suppressed by modifying src/test/findbugsExcludeFile.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1024) Provide proper test cases to test the CLI of hierarchical queues

2009-09-22 Thread V.V.Chaitanya Krishna (JIRA)
Provide proper test cases to test the CLI of hierarchical queues


 Key: MAPREDUCE-1024
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1024
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: V.V.Chaitanya Krishna


After the implementation of HADOOP-6277, the MRCLI can be modified to provide 
more appropriate test cases to test the CLI for hierarchical queues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1023) Newly introduced findBugs warnings should be suppressed

2009-09-22 Thread Vinod K V (JIRA)
Newly introduced findBugs warnings should be suppressed
---

 Key: MAPREDUCE-1023
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1023
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Vinod K V
 Fix For: 0.21.0


FindBugs warnings introduced by MAPREDUCE-711 and HADOOP-6230 should be 
suppressed by modifying src/test/findbugsExcludeFile.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-890) After HADOOP-4491, the user who started mapred system is not able to run job.

2009-09-22 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758191#action_12758191
 ] 

Vinod K V commented on MAPREDUCE-890:
-

The same (latest) patch works post feature-freeze even with all the patches 
that went in.

> After HADOOP-4491, the user who started mapred system is not able to run job.
> -
>
> Key: MAPREDUCE-890
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-890
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Karam Singh
>Assignee: Vinod K V
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-890-20090904.txt, MAPREDUCE-890-20090909.txt
>
>
> Even setup and cleanup task of job fails due exception -: It fails to create 
> job and related directories under mapred.local.dir/taskTracker/jobcache
> Directories are created as -:
> [dr-xrws--- mapred   hadoop  ]  job_200908190916_0002
> mapred is not wrtie under this. Even manually I failed to touch file.
> mapred is use of started mr cluster 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1022) Trunk tests fail because of test-failure in Vertica

2009-09-22 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758190#action_12758190
 ] 

Ravi Gummadi commented on MAPREDUCE-1022:
-

Why is JobContextImpl under mapreduce/task/  directory ? Should it be under 
mapreduce/  ?

> Trunk tests fail because of test-failure in Vertica
> ---
>
> Key: MAPREDUCE-1022
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1022
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Vinod K V
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1022.20090922.txt
>
>
> ant test fails with
> {code}
> [javac] 
> /home/vinodkv/Workspace/eclipse-workspace/hadoop-mapreduce/src/contrib/vertica/src/test/org/apache/hadoop/vertica/TestVertica.java:43:
>  cannot find symbol
> [javac] symbol  : class JobContextImpl
> [javac] location: package org.apache.hadoop.mapreduce
> [javac] import org.apache.hadoop.mapreduce.JobContextImpl;
> [javac]   ^
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1022) Trunk tests fail because of test-failure in Vertica

2009-09-22 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1022:
-

Attachment: MAPREDUCE-1022.20090922.txt

As Ravi pointed offline, this is due to the wrong package name for 
JobContextImpl used in TestVertical.java. Changing it to 
org.apache.hadoop.mapreduce.task.JobContextImpl does the trick. Attaching a 
patch.

> Trunk tests fail because of test-failure in Vertica
> ---
>
> Key: MAPREDUCE-1022
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1022
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Vinod K V
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1022.20090922.txt
>
>
> ant test fails with
> {code}
> [javac] 
> /home/vinodkv/Workspace/eclipse-workspace/hadoop-mapreduce/src/contrib/vertica/src/test/org/apache/hadoop/vertica/TestVertica.java:43:
>  cannot find symbol
> [javac] symbol  : class JobContextImpl
> [javac] location: package org.apache.hadoop.mapreduce
> [javac] import org.apache.hadoop.mapreduce.JobContextImpl;
> [javac]   ^
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1022) Trunk tests fail because of test-failure in Vertica

2009-09-22 Thread Vinod K V (JIRA)
Trunk tests fail because of test-failure in Vertica
---

 Key: MAPREDUCE-1022
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1022
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.0, 0.22.0
Reporter: Vinod K V
Priority: Blocker
 Fix For: 0.21.0


ant test fails with
{code}
[javac] 
/home/vinodkv/Workspace/eclipse-workspace/hadoop-mapreduce/src/contrib/vertica/src/test/org/apache/hadoop/vertica/TestVertica.java:43:
 cannot find symbol
[javac] symbol  : class JobContextImpl
[javac] location: package org.apache.hadoop.mapreduce
[javac] import org.apache.hadoop.mapreduce.JobContextImpl;
[javac]   ^
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-679) XML-based metrics as JSP servlet for JobTracker

2009-09-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758183#action_12758183
 ] 

Steve Loughran commented on MAPREDUCE-679:
--

Konstantin -no, it's not manual. 

A JUnit test case brings up a MiniMR cluster, constructs the URL 
{{"http://localhost:"; + infoPort + "/jobtracker.jspx"}} and then does something 
very devious - hands off the URL to the XML parser and says "parse this". If 
the JSPX isnt there or wont run, the HTTP errors get picked up and reported. If 
the page is there but isn't valid XML, the parser will catch and report that. 
Very nice indeed. No need for even an extra JAR like HttpUnit or HtmlUnit. 



> XML-based metrics as JSP servlet for JobTracker
> ---
>
> Key: MAPREDUCE-679
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-679
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: jobtracker
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Fix For: 0.21.0
>
> Attachments: example-jobtracker-completed-job.xml, 
> example-jobtracker-running-job.xml, MAPREDUCE-679.2.patch, 
> MAPREDUCE-679.3.patch, MAPREDUCE-679.4.patch, MAPREDUCE-679.5.patch, 
> MAPREDUCE-679.6.patch, MAPREDUCE-679.7.patch, MAPREDUCE-679.patch
>
>
> In HADOOP-4559, a general REST API for reporting metrics was proposed but 
> work seems to have stalled. In the interim, we have a simple XML translation 
> of the existing JobTracker status page which provides the same metrics 
> (including the tables of running/completed/failed jobs) as the human-readable 
> page. This is a relatively lightweight addition to provide some 
> machine-understandable metrics reporting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-964) Inaccurate values in jobSummary logs

2009-09-22 Thread Sreekanth Ramakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758171#action_12758171
 ] 

Sreekanth Ramakrishnan commented on MAPREDUCE-964:
--

The reason for the very high map slot seconds is because of following senario,

We hand out of a task to tracker, tracker gets lost, then we kill the task, as 
TT has not reported back during lost period our task start time is not updated 
according to JT the task has not been launched and the time and then we fail 
the task and the task has a finish time but not start time.

Following is log from JT:

{noformat}
2009-09-18 13:28:20,990 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
'attempt_200909180756_0009_m_03_2' to tip task_200909180756_0009_m_03, 
for tracker 'tracker_xxx:localhost/127.0.0.1:45507'
2009-09-18 13:39:51,482 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
from attempt_200909180756_0009_m_03_2: Lost task tracker: 
tracker_xxx:localhost/127.0.0.1:45507
2009-09-18 13:39:56,997 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
(cleanup)'attempt_200909180756_0009_m_03_2' to tip 
task_200909180756_0009_m_03, for tracker 
'tracker_xxx:localhost/127.0.0.1:60187'
2009-09-18 13:40:00,000 INFO org.apache.hadoop.mapred.JobInProgress: TaskDebug 
attemptId : attempt_200909180756_0009_m_03_2 slots : SLOTS_MILLIS_MAPS 
tip.numslots is: 1 difference to add : 1253281197003 status start : 0 status 
end time : 1253281197003
2009-09-18 13:40:00,000 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_200909180756_0009_m_03_2' from 
'tracker_xxx:localhost/127.0.0.1:60187'
{noformat}

> Inaccurate values in jobSummary logs
> 
>
> Key: MAPREDUCE-964
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-964
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1
>Reporter: Rajiv Chittajallu
>Assignee: Sreekanth Ramakrishnan
>Priority: Critical
>
> For some jobs the mapSlotSeconds is incorrect.
> negative value
> 09/09/01 18:31:44 INFOmapred.JobInProgress$JobSummary: 
> jobId=job_200908270718_4568,submitTime=1251823543976,launchTime=1251823554310,finishTime=1251829904565,
> 
> numMaps=7965,numSlotsPerMap=1,numReduces=40,numSlotsPerReduce=1,user=wile,queue=runner,status=SUCCEEDED,
>  
> mapSlotSeconds=-2503133523,reduceSlotsSeconds=186536,clusterMapCapacity=11262,clusterReduceCapacity=3754
> or too high
> 09/09/02 23:59:57 INFO mapred.JobInProgress$JobSummary: 
> jobId=job_200908270718_5861,submitTime=1251935672924,launchTime=1251935687698,finishTime=1251935997949,
> 
> numMaps=1026,numSlotsPerMap=1,numReduces=10,numSlotsPerReduce=1,user=dfsload,queue=gridops,status=SUCCEEDED,
>  
> mapSlotSeconds=1251949742,reduceSlotsSeconds=537,clusterMapCapacity=11262,clusterReduceCapacity=3754

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1020) Add more unit tests to test the queue refresh feature MAPREDUCE-893

2009-09-22 Thread Vinod K V (JIRA)
Add more unit tests to test the queue refresh feature MAPREDUCE-893
---

 Key: MAPREDUCE-1020
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1020
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.0
Reporter: Vinod K V
 Fix For: 0.21.0


MAPREDUCE-893 included unit tests verifying the sanity of the refresh feature - 
both the queue properities' refresh as well as the scheduler properties' 
refresh. The test suite can and should be expanded. This will help easily 
identifying issues that will otherwise be caught during manual testing. For 
e.g., during manual testing of MAPREDUCE-893, we identified an NPE in the 
scheduler iteration occuring during heartbeat, which could have been easily 
identified by unit tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1021) mapred-default.xml does not document all framework config parameters

2009-09-22 Thread Sharad Agarwal (JIRA)
mapred-default.xml does not document all framework config parameters


 Key: MAPREDUCE-1021
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1021
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Reporter: Sharad Agarwal
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0


MAPREDUCE-849 renamed and categorized configuration keys. All the configuration 
keys should be documented with defaults in mapred-default.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1014) After the 0.21 branch, MapReduce trunk doesn't compile

2009-09-22 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758159#action_12758159
 ] 

Ravi Gummadi commented on MAPREDUCE-1014:
-

hdfs-trunk seems fine(I don't see the issue Devaraj mentioned above). Could 
some other JIRA fixed hdfs build issue ?

> After the 0.21 branch, MapReduce trunk doesn't compile
> --
>
> Key: MAPREDUCE-1014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1014
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Devaraj Das
>Assignee: Ravi Gummadi
>Priority: Blocker
> Fix For: 0.22.0
>
>
> When ant is run, the build fails with compilation problems. The first of that 
> is:
> compile-mapred-classes:
>   [taskdef] log4j:ERROR Could not instantiate class 
> [org.apache.hadoop.metrics.jvm.EventCounter].
>   [taskdef] java.lang.ClassNotFoundException: 
> org.apache.hadoop.metrics.jvm.EventCounter
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.findClassInComponents(AntClassLoader.java:1383)
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.findClass(AntClassLoader.java:1324)
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.loadClass(AntClassLoader.java:1072)
>   [taskdef] at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
>   [taskdef] at 
> java.lang.ClassLoader.loadClassInternal(ClassLoader.java:402)
>   [taskdef] at java.lang.Class.forName0(Native Method)
>   [taskdef] at java.lang.Class.forName(Class.java:169)
>   [taskdef] at org.apache.log4j.helpers.Loader.loadClass(Loader.java:179)
>   [taskdef] at 
> org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:320)
>   [taskdef] at 
> org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:121)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:664)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:476)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-270) TaskTracker could send an out-of-band heartbeat when the last running map/reduce completes

2009-09-22 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-270:


Attachment: MAPREDUCE-270_yhadoop20.patch
MAPREDUCE-270.patch

Straight-forward patches to trunk and yahoop hadoop-0.20 distribution 
re-introducing the out-of-band heartbeat on task completion (configurable via a 
secret 'mapreduce.tasktracker.oob.heartbeat' knob).

> TaskTracker could send an out-of-band heartbeat when the last running 
> map/reduce completes
> --
>
> Key: MAPREDUCE-270
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-270
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Attachments: MAPREDUCE-270.patch, MAPREDUCE-270_yhadoop20.patch
>
>
> Currently the TaskTracker strictly respects the heartbeat interval, this 
> causes utilization issues when all running tasks complete. We could send an 
> out-of-band heartbeat in that case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-856) Localized files from DistributedCache should have right access-control

2009-09-22 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-856:


Release Note: 
Fixed TaskTracker and related classes so as to set correct and most restrictive 
access control for DistributedCache files/archives.
 - To do this, it changed the directory structure of per-job local files on a 
TaskTracker to the following:
$mapred.local.dir
   `-- taskTracker
`-- $user
   |- distcache
   `-- jobcache
 - Distributed cache files/archives are now user-owned by the job-owner and the 
group-owned by the special group-owner of the task-controller binary. The 
files/archives are set most private permissions possible, and as soon as 
possible, immediately after the files/dirs are first localized on the TT.
 - As depicted by the new directory structure, a directory corresponding to 
each user is created on each TT when that particular user's first task are 
assigned to the corresponding TT. These user directories remain on the TT 
forever are not cleaned when unused, which is targeted to be fixed via 
MAPREDUCE-1019.
 - The distributed cache files are now accessible _only_ by the user who first 
localized them. Sharing of these files across users is no longer possible, but 
is targeted for future versions via MAPREDUCE-744.

> Localized files from DistributedCache should have right access-control
> --
>
> Key: MAPREDUCE-856
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-856
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: tasktracker
>Reporter: Arun C Murthy
>Assignee: Vinod K V
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-856-20090820.txt, MAPREDUCE-856-20090821.txt, 
> MAPREDUCE-856-20090825.3.txt, MAPREDUCE-856-20090827.txt, 
> MAPREDUCE-856-20090903.txt, MAPREDUCE-856-20090904.1.txt, 
> MAPREDUCE-856-20090904.txt, MAPREDUCE-856-20090907.1.txt, 
> MAPREDUCE-856-20090907.txt, MAPREDUCE-856-20090908.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1019) Stale user directories left on TTs after MAPREDUCE-856

2009-09-22 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1019:
-

  Description: 
MAPREDUCE-856 changed the directory structure of job files on the TT. As part 
of this, it introduced user directories under taskTracker subdirectory which 
contains private files of a user. The user directories are created on the TT 
when the user's first task is assigned to this TT. But these user-directories 
are never cleaned up from the TT, even when no task of this user is running on 
this TT. This essentially leaves empty user directories hanging around on the 
TT, whose number may increase over time.

This was originally intended to be fixed in MAPREDUCE-856, but could not be 
done because increasing complexity already stretched that issue.

  was:
MAPREUCE-856 changed the directory structure of job files on the TT. As part of 
this, it introduced user directories under taskTracker subdirectory which 
contains private files of a user. The user directories are created on the TT 
when the user's first task is assigned to this TT. But these user-directories 
are never cleaned up from the TT, even when no task of this user is running on 
this TT. This essentially leaves empty user directories hanging around on the 
TT, whose number may increase over time.

This was originally intended to be fixed in MAPREDUCE-856, but could not be 
done because increasing complexity already stretched that issue.

Affects Version/s: 0.21.0

> Stale user directories left on TTs after MAPREDUCE-856
> --
>
> Key: MAPREDUCE-1019
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1019
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Vinod K V
> Fix For: 0.21.0
>
>
> MAPREDUCE-856 changed the directory structure of job files on the TT. As part 
> of this, it introduced user directories under taskTracker subdirectory which 
> contains private files of a user. The user directories are created on the TT 
> when the user's first task is assigned to this TT. But these user-directories 
> are never cleaned up from the TT, even when no task of this user is running 
> on this TT. This essentially leaves empty user directories hanging around on 
> the TT, whose number may increase over time.
> This was originally intended to be fixed in MAPREDUCE-856, but could not be 
> done because increasing complexity already stretched that issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1019) Stale user directories left on TTs after MAPREDUCE-856

2009-09-22 Thread Vinod K V (JIRA)
Stale user directories left on TTs after MAPREDUCE-856
--

 Key: MAPREDUCE-1019
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1019
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Reporter: Vinod K V
 Fix For: 0.21.0


MAPREUCE-856 changed the directory structure of job files on the TT. As part of 
this, it introduced user directories under taskTracker subdirectory which 
contains private files of a user. The user directories are created on the TT 
when the user's first task is assigned to this TT. But these user-directories 
are never cleaned up from the TT, even when no task of this user is running on 
this TT. This essentially leaves empty user directories hanging around on the 
TT, whose number may increase over time.

This was originally intended to be fixed in MAPREDUCE-856, but could not be 
done because increasing complexity already stretched that issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1014) After the 0.21 branch, MapReduce trunk doesn't compile

2009-09-22 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1014:


Affects Version/s: 0.22.0
Fix Version/s: 0.22.0

Issue is only in trunk. Not in 0.21 branch.

> After the 0.21 branch, MapReduce trunk doesn't compile
> --
>
> Key: MAPREDUCE-1014
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1014
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Devaraj Das
>Assignee: Ravi Gummadi
>Priority: Blocker
> Fix For: 0.22.0
>
>
> When ant is run, the build fails with compilation problems. The first of that 
> is:
> compile-mapred-classes:
>   [taskdef] log4j:ERROR Could not instantiate class 
> [org.apache.hadoop.metrics.jvm.EventCounter].
>   [taskdef] java.lang.ClassNotFoundException: 
> org.apache.hadoop.metrics.jvm.EventCounter
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.findClassInComponents(AntClassLoader.java:1383)
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.findClass(AntClassLoader.java:1324)
>   [taskdef] at 
> org.apache.tools.ant.AntClassLoader.loadClass(AntClassLoader.java:1072)
>   [taskdef] at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
>   [taskdef] at 
> java.lang.ClassLoader.loadClassInternal(ClassLoader.java:402)
>   [taskdef] at java.lang.Class.forName0(Native Method)
>   [taskdef] at java.lang.Class.forName(Class.java:169)
>   [taskdef] at org.apache.log4j.helpers.Loader.loadClass(Loader.java:179)
>   [taskdef] at 
> org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:320)
>   [taskdef] at 
> org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:121)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:664)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440)
>   [taskdef] at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:476)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-781) distcp overrides user-selected job name

2009-09-22 Thread Venkatesh S (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758152#action_12758152
 ] 

Venkatesh S commented on MAPREDUCE-781:
---

> No documentation update for this?
This was a defect fix. Name was not carried thru from conf but hard coded as 
DistCp.

> distcp overrides user-selected job name
> ---
>
> Key: MAPREDUCE-781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-781
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 0.21.0
>Reporter: Rob Weltman
>Assignee: Venkatesh S
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-781.patch, MAPREDUCE-781.patch, 
> MAPREDUCE_781.patch
>
>
> distcp hard-codes the hadoop job name to "distcp" even if the user specifies 
> a job name. This is a problem in general, but especially for generalized 
> replication services since the Job Tracker UI and history can't be made to 
> indicate what is being copied in the job name.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-879) TestTaskTrackerLocalization fails on MAC OS

2009-09-22 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758150#action_12758150
 ] 

Vinod K V commented on MAPREDUCE-879:
-

TestTrackerDistributedCacheManagerWithLinuxTaskController added as part of 
MAPREDUCE-856 will also fail on Macs for the same reason.

> TestTaskTrackerLocalization fails on MAC OS
> ---
>
> Key: MAPREDUCE-879
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-879
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
> Environment: Mac OS X 10.5.7
>Reporter: Devaraj Das
>Assignee: Vinod K V
> Fix For: 0.21.0
>
> Attachments: 
> TEST-org.apache.hadoop.mapred.TestTaskTrackerLocalization.txt
>
>
> TestTaskTrackerLocalization failed on an 'ant test' run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-879) TestTaskTrackerLocalization fails on MAC OS

2009-09-22 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-879:


Priority: Blocker  (was: Major)

> TestTaskTrackerLocalization fails on MAC OS
> ---
>
> Key: MAPREDUCE-879
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-879
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
> Environment: Mac OS X 10.5.7
>Reporter: Devaraj Das
>Assignee: Vinod K V
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: 
> TEST-org.apache.hadoop.mapred.TestTaskTrackerLocalization.txt
>
>
> TestTaskTrackerLocalization failed on an 'ant test' run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.