[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

A preliminary patch

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1470) Move Delegation token into Common so that we can use it for MapReduce also

2010-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831368#action_12831368
 ] 

Hudson commented on MAPREDUCE-1470:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #231 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/231/])
. Move delegation tokens from HDFS to Common so that 
MapReduce can use them too. (omalley)


 Move Delegation token into Common so that we can use it for MapReduce also
 --

 Key: MAPREDUCE-1470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1470
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.22.0

 Attachments: mr-1470.patch


 We need to update one reference for map/reduce when we move the hdfs 
 delegation tokens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831371#action_12831371
 ] 

Devaraj Das commented on MAPREDUCE-1433:


And, please define the config variables in mapred-default.xml

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: m-1440.patch

Updated with a few more fixes.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: (was: m-1440.patch)

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

This time attaching the right file. *smile*

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Bump the version number of ClientProtocol

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Ok, this has an improved test and fixes a copy and paste bug.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1307) Introduce the concept of Job Permissions

2010-02-09 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V reassigned MAPREDUCE-1307:


Assignee: Vinod K V

 Introduce the concept of Job Permissions
 

 Key: MAPREDUCE-1307
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Devaraj Das
Assignee: Vinod K V
 Fix For: 0.22.0

 Attachments: 1307-early-1.patch


 It would be good to define the notion of job permissions analogous to file 
 permissions. Then the JobTracker can restrict who can read (e.g. look at 
 the job page) or modify (e.g. kill) jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1307) Introduce the concept of Job Permissions

2010-02-09 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831402#action_12831402
 ] 

Vinod K V commented on MAPREDUCE-1307:
--

OK.. I am going ahead with ACLs for job permissions. Here's the proposal:

Users can interact with their jobs via mapred commands, JT RPCs, JT web UI and 
TT web UI. This issue only handles the authorization of RPCs and hence the 
command-line clients. Authorization for web UI will be addressed by 
MAPREDUCE-1455.

h4. Per-job ACLs can be set by job in JobConf during the submission.
 - As of now, we will only have two per-job ACLs
-- mapreduce.job.acl-modify-job
-- mapreduce.job.acl-view-job
 - Job owner has the authorization to do _anything_ with the job irrespective 
of the configured ACLs.
 - superuser(the user who starts the mapred cluster) and members of 
supergroup(configured on JT via mapred.permissions.supergroup) have the 
authorization to do _anything_ with the job irrespective of the configured ACLs.

h4. mapreduce.job.acl-modify-job
 - This guards *all* the modifications w.r.t a job. This takes care of all the 
following operations that come under this category:
-- killing a job
-- killing a task of a job, failing a task of a job
-- setting the priority of a job
 - Each of these operations are also guarded by the per-queue level ACL 
acl-administer-jobs. So a caller(other than the job-owner and the 
superuser/supergroup) should have the authorization to satisfy both the 
queue-level ACL and then the job-level ACL.

h4. mapreduce.job.acl-view-job
 - This guards *some* of the job-views
 - For now, we *only* protect APIs that can return possibly sensitive 
information of the job-owner
-- job-level counters
-- task-level counters
-- task-logs displayed by TT UI and
-- job.xml showed by JT UI
(The last twowill be handled by MAPREDUCE-1455).
 - The above means every other piece information of jobs is still accessible by 
any other user, for e.g., JobStatus, JobProfile, list of jobs in the queue, etc.

 Introduce the concept of Job Permissions
 

 Key: MAPREDUCE-1307
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Devaraj Das
 Fix For: 0.22.0

 Attachments: 1307-early-1.patch


 It would be good to define the notion of job permissions analogous to file 
 permissions. Then the JobTracker can restrict who can read (e.g. look at 
 the job page) or modify (e.g. kill) jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Status: Patch Available  (was: Open)

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: (was: mr-1433.patch)

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Adds license to the test case.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: (was: mr-1433.patch)

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: (was: mr-1433.patch)

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-927) Cleanup of task-logs should happen in TaskTracker instead of the Child

2010-02-09 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831417#action_12831417
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-927:
---

With the current proposal, we found two things that need an answer.
# Memory footprint of the TaskTracker: Each map entry (JobID, Long) would take 
about 40 bytes. If the userLogRetainsHours is configured to 7days and there are 
1lakh job's tasks run by a TaskTracker in a day, the map would take up 28MB of 
memory. I guess this memory footprint is fine compared to persisting the same 
information to disk and reading it back and forth from disk until the directory 
is removed.
# If TaskTracker is reinited/ restarted and a job completed when the 
TaskTracker was down, then TaskTracker would not get a KillJobAction for the 
job. Then we can keep the userlogs for default userLogRetainsHours, after the 
reinit/restart.

Thoughts?

 Cleanup of task-logs should happen in TaskTracker instead of the Child
 --

 Key: MAPREDUCE-927
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-927
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0


 Task logs' cleanup is being done in Child now. This is undesirable atleast 
 for two reasons: 1) failures while cleaning up will affect the user's tasks, 
 and 2) the task's wall time will get affected due to operations that TT 
 actually should own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

2010-02-09 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831537#action_12831537
 ] 

Milind Bhandarkar commented on MAPREDUCE-326:
-

 Back to a low-level binary API: the proposal here isn't to deprecate any 
 higher level APIs, but rather to add a new lower-level API that we can 
 implement both the current APIs and new APIs atop. This should in fact help 
 us to preserve high-level API compatibility longer, since the mapreduce 
 kernel will be independent of the high-level API.

+1 !!

I have always thought of hadoop MR APIs as assembly language, and gradually no 
one will use it directly. The low-level APIs will be great for Pig, Hive, HBase 
and other high-level languages to translate to, without making compromises for 
efficiency.

 The lowest level map-reduce APIs should be byte oriented
 

 Key: MAPREDUCE-326
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: eric baldeschwieler

 As discussed here:
 https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
 The templates, serializers and other complexities that allow map-reduce to 
 use arbitrary types complicate the design and lead to lots of object creates 
 and other overhead that a byte oriented design would not suffer.  I believe 
 the lowest level implementation of hadoop map-reduce should have byte string 
 oriented APIs (for keys and values).  This API would be more performant, 
 simpler and more easily cross language.
 The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1471) FileOutputCommitter does not safely clean up it's temporary files

2010-02-09 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831546#action_12831546
 ] 

Arun C Murthy commented on MAPREDUCE-1471:
--

Jim, all file-based output-formats check to ensure that their output-directory 
is *not* present when they start i.e. 'working_path' is owned by one and only 
one job, hence this behaviour is correct.

 FileOutputCommitter does not safely clean up it's temporary files
 -

 Key: MAPREDUCE-1471
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1471
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Jim Finnessy
   Original Estimate: 4h
  Remaining Estimate: 4h

 When the FileOutputCommitter cleans up during it's cleanupJob method, it 
 potentially deletes the temporary files of other concurrent jobs.
 Since all the temporary files for all concurrent jobs are written to 
 working_path/_temporary/ any concurrent tasks that have the same working_path 
 will remove all currently executing jobs when it removes 
 working_path/_temporary during job cleanup.
 If the file name output is guaranteed by the client application to be unique, 
 the temporary files/directories should also be guaranteed to be unique to 
 avoid this problem. Suggest modifying cleanupJob to only remove files that it 
 created itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831577#action_12831577
 ] 

Arun C Murthy commented on MAPREDUCE-1463:
--

-1

These knobs seem backwards - as both Todd and Amar have pointed out we could 
add heuristics to tweak mapreduce.job.reduce.slowstart.completedmaps 
automatically without adding more config knobs.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1307) Introduce the concept of Job Permissions

2010-02-09 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831587#action_12831587
 ] 

dhruba borthakur commented on MAPREDUCE-1307:
-

 The advantage with the file-system model is that it is really simple and 
 would handle almost all cases that we might come across. 

can somebody please explain why we are abandoning the file-system permission 
model, and going towards ACLs. Is there a particular use-case that the fs 
permission model does not address?

 Introduce the concept of Job Permissions
 

 Key: MAPREDUCE-1307
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Devaraj Das
Assignee: Vinod K V
 Fix For: 0.22.0

 Attachments: 1307-early-1.patch


 It would be good to define the notion of job permissions analogous to file 
 permissions. Then the JobTracker can restrict who can read (e.g. look at 
 the job page) or modify (e.g. kill) jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job

2010-02-09 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831593#action_12831593
 ] 

Owen O'Malley commented on MAPREDUCE-1434:
--

+1

It helps a more interesting use case where you have a pipeline of mapreduce 
jobs and don't want the 2nd set of maps to wait until the last reduce finishes. 
It would be great in job control could use this as an optimization.

You need to have a method where the application declares that all of the input 
has been added. To avoid having reduces holding slots that they can't use, I'd 
suggest that no reduces should be launched until the input is complete.

A timeout is also required so that if a user disappears the job is killed after 
N minutes of no new input and not having the input complete.

 Dynamic add input for one job
 -

 Key: MAPREDUCE-1434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
 Environment: 0.19.0
Reporter: Xing Shi

 Always we should firstly upload the data to hdfs, then we can analize the 
 data using hadoop mapreduce.
 Sometimes, the upload process takes long time. So if we can add input during 
 one job, the time can be saved.
 WHAT?
 Client:
 a) hadoop job -add-input jobId inputFormat ...
 Add the input to jobid
 b) hadoop job -add-input done
 Tell the JobTracker, the input has been prepared over.
 c) hadoop job -add-input status jobid
 Show how many input the jobid has.
 HOWTO?
 Mainly, I think we should do three things:
 1. JobClinet: here JobClient should support add input to a job, indeed, 
 JobClient generate the split, and submit to JobTracker.
 2. JobTracker: JobTracker support addInput, and add the new tasks to the 
 original mapTasks. Because the uploaded data will be 
 processed quickly, so it also should update the scheduler to support pending 
 a map task till Client tells the Job input done.
 3. Reducer: the reducer should also update the mapNums, so it will shuffle 
 right.
 This is the rough idea, and I will update it .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job

2010-02-09 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831595#action_12831595
 ] 

Arun C Murthy commented on MAPREDUCE-1434:
--

+1

I'm sure Pig/Hive would be substantial beneficiaries... their job pipelines 
would benefit a lot.

 Dynamic add input for one job
 -

 Key: MAPREDUCE-1434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
 Environment: 0.19.0
Reporter: Xing Shi

 Always we should firstly upload the data to hdfs, then we can analize the 
 data using hadoop mapreduce.
 Sometimes, the upload process takes long time. So if we can add input during 
 one job, the time can be saved.
 WHAT?
 Client:
 a) hadoop job -add-input jobId inputFormat ...
 Add the input to jobid
 b) hadoop job -add-input done
 Tell the JobTracker, the input has been prepared over.
 c) hadoop job -add-input status jobid
 Show how many input the jobid has.
 HOWTO?
 Mainly, I think we should do three things:
 1. JobClinet: here JobClient should support add input to a job, indeed, 
 JobClient generate the split, and submit to JobTracker.
 2. JobTracker: JobTracker support addInput, and add the new tasks to the 
 original mapTasks. Because the uploaded data will be 
 processed quickly, so it also should update the scheduler to support pending 
 a map task till Client tells the Job input done.
 3. Reducer: the reducer should also update the mapNums, so it will shuffle 
 right.
 This is the rough idea, and I will update it .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1403) Save file-sizes of each of the artifacts in DistributedCache in the JobConf

2010-02-09 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-1403:


Assignee: Arun C Murthy  (was: Hong Tang)

 Save file-sizes of each of the artifacts in DistributedCache in the JobConf
 ---

 Key: MAPREDUCE-1403
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1403
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1403_yhadoop20.patch


 It would be a useful metric to collect... potentially GridMix could use it to 
 emulate jobs which use the DistributedCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1403) Save file-sizes of each of the artifacts in DistributedCache in the JobConf

2010-02-09 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1403:
-

Attachment: MAPREDUCE-1403_yhadoop20.patch

Patch for y20 distribution. Not to be committed.

 Save file-sizes of each of the artifacts in DistributedCache in the JobConf
 ---

 Key: MAPREDUCE-1403
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1403
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1403_yhadoop20.patch


 It would be a useful metric to collect... potentially GridMix could use it to 
 emulate jobs which use the DistributedCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job

2010-02-09 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831608#action_12831608
 ] 

Owen O'Malley commented on MAPREDUCE-1434:
--

One approach might be to have a subclass of InputFormat, such as:

{code}
public abstract class IncrementalInputFormat extends InputFormat {
  InputSplit[] getNewInputSplits(JobContext context) throws IOException;
}
{code}

and such input formats return any new splits that they have found since the 
last time the method was called.

 Dynamic add input for one job
 -

 Key: MAPREDUCE-1434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
 Environment: 0.19.0
Reporter: Xing Shi

 Always we should firstly upload the data to hdfs, then we can analize the 
 data using hadoop mapreduce.
 Sometimes, the upload process takes long time. So if we can add input during 
 one job, the time can be saved.
 WHAT?
 Client:
 a) hadoop job -add-input jobId inputFormat ...
 Add the input to jobid
 b) hadoop job -add-input done
 Tell the JobTracker, the input has been prepared over.
 c) hadoop job -add-input status jobid
 Show how many input the jobid has.
 HOWTO?
 Mainly, I think we should do three things:
 1. JobClinet: here JobClient should support add input to a job, indeed, 
 JobClient generate the split, and submit to JobTracker.
 2. JobTracker: JobTracker support addInput, and add the new tasks to the 
 original mapTasks. Because the uploaded data will be 
 processed quickly, so it also should update the scheduler to support pending 
 a map task till Client tells the Job input done.
 3. Reducer: the reducer should also update the mapNums, so it will shuffle 
 right.
 This is the rough idea, and I will update it .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message

2010-02-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831620#action_12831620
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1399:
---

Hudson does not seem working.  Ran test-patch locally.
{noformat}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
{noformat}

 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message

2010-02-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831625#action_12831625
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1399:
---

Oops, I mistakenly posted [the test-patch 
result|https://issues.apache.org/jira/browse/MAPREDUCE-1399?focusedCommentId=12831620page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12831620]
 for MAPREDUCE-1425 to this.  Sorry ...

 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError

2010-02-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831626#action_12831626
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1425:
---

Hudson does not seem working.  Ran test-patch locally.
{noformat}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]
{noformat}


 archive throws OutOfMemoryError
 ---

 Key: MAPREDUCE-1425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, 
 MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, 
 MAPREDUCE-1425_y_0.20.patch


 {noformat}
 -bash-3.1$ hadoop  archive -archiveName t4.har -p . t4 .
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.compile(Pattern.java:1432)
 at java.util.regex.Pattern.init(Pattern.java:1133)
 at java.util.regex.Pattern.compile(Pattern.java:847)
 at java.lang.String.replace(String.java:2208)
 at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
 at 
 org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
 at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at 
 org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

2010-02-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831631#action_12831631
 ] 

Allen Wittenauer commented on MAPREDUCE-1266:
-

if you are using jvm reuse, then that 1s disappears, right?


 Allow heartbeat interval smaller than 3 seconds for tiny clusters
 -

 Key: MAPREDUCE-1266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, task, tasktracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Minor

 For small clusters, the heartbeat interval has a large effect on job latency. 
 This is especially true on pseudo-distributed or other tiny (5 nodes) 
 clusters. It's not a big deal for production, but new users would have a 
 happier first experience if Hadoop seemed snappier.
 I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 
 0.5 seconds (but have it governed by an undocumented config parameter in case 
 people don't like this change). The cluster size-based ramp up of interval 
 will maintain the current scalable behavior for large clusters with no 
 negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

2010-02-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831644#action_12831644
 ] 

Todd Lipcon commented on MAPREDUCE-1266:


bq. if you are using jvm reuse, then that 1s disappears, right? 

Not really, since JVM reuse doesn't reuse between maps and reduces.

The time sequence of a small job looks like:

Client:
  Submit job
JT:
  Create tasks (initialize job) on JT
  wait for a TT to heartbeat
TT:
  start JVM
child:
  process map task
TT:
  send accelerated heartbeat once map task is complete (I forget whether this 
is in 0.20 or came later)
  receive reduce task, start reduce JVM (regardless of JVM reuse)
child:
  process reduce task
TT:
  send completion heartbeat

I guess there are also some setup/cleanup tasks going on in there as well. 
Since we're talking about a hypothetical one map, one reduce, we're just 
cutting down the time between initting the job and getting the first JVM on a 
TT.

In a multimapper or multireducer job, the cost shows up in how long it takes 
for all of the tasks to get scheduled - it will only schedule one task per 
heartbeat with some schedulers. The fair scheduler after MAPREDUCE-706 can 
assign multiple at the same time, which should help substantially.

 Allow heartbeat interval smaller than 3 seconds for tiny clusters
 -

 Key: MAPREDUCE-1266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, task, tasktracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Minor

 For small clusters, the heartbeat interval has a large effect on job latency. 
 This is especially true on pseudo-distributed or other tiny (5 nodes) 
 clusters. It's not a big deal for production, but new users would have a 
 happier first experience if Hadoop seemed snappier.
 I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 
 0.5 seconds (but have it governed by an undocumented config parameter in case 
 people don't like this change). The cluster size-based ramp up of interval 
 will maintain the current scalable behavior for large clusters with no 
 negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message

2010-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831653#action_12831653
 ] 

Hadoop QA commented on MAPREDUCE-1399:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12435047/m1399_20100205trunk2.patch
  against trunk revision 907967.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/console

This message is automatically generated.

 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Add some code that sets the service name on the received token.

All tests pass and test-patch is clean.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message

2010-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831665#action_12831665
 ] 

Hadoop QA commented on MAPREDUCE-1399:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12435047/m1399_20100205trunk2.patch
  against trunk revision 907967.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/console

This message is automatically generated.

 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-09 Thread Leonid Furman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831697#action_12831697
 ] 

Leonid Furman commented on MAPREDUCE-1341:
--

It looks like the hudson build hasn't picked up the latest patch - 
MAPREDUCE-1341.4.patch. Should I flip the ticket status in order to restart the 
build?

Thanks!

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-09 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831704#action_12831704
 ] 

Aaron Kimball commented on MAPREDUCE-1341:
--

I don't see the patch listed in 
http://hudson.zones.apache.org/hudson/view/Hadoop/job/Mapreduce-Patch-Admin/lastSuccessfulBuild/artifact/MAPREDUCE_PatchQueue.html
 so yea, go through cancel patch / submit patch again.

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Updated with new code to normalize the hostname.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-09 Thread Leonid Furman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonid Furman updated MAPREDUCE-1341:
-

Status: Open  (was: Patch Available)

Cycling patch to retrigger hudson build.

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-09 Thread Leonid Furman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonid Furman updated MAPREDUCE-1341:
-

Assignee: Leonid Furman
  Status: Patch Available  (was: Open)

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831710#action_12831710
 ] 

Devaraj Das commented on MAPREDUCE-1433:


Please pass the right text to setService in getDelegationToken

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Ok, now the patch has the right fix in it.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-09 Thread Leonid Furman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831714#action_12831714
 ] 

Leonid Furman commented on MAPREDUCE-1341:
--

Aaron, it doesn't seem to populate the queue - does it usually happen 
immediately or after some time?

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-09 Thread Leonid Furman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831717#action_12831717
 ] 

Leonid Furman commented on MAPREDUCE-1341:
--

Never mind, it is there now. Thank you!

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831722#action_12831722
 ] 

Devaraj Das commented on MAPREDUCE-1433:


+1

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1433:
---

Attachment: 1433.bp20.patch

Patch for Y20. Not for commit.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: 1433.bp20.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

It passes unit tests and test-patch.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.22.0

 Attachments: 1433.bp20.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError

2010-02-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831753#action_12831753
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1425:
---

All tests passed except TestChainErrors, which still failed after the patch had 
been reverted.

 archive throws OutOfMemoryError
 ---

 Key: MAPREDUCE-1425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, 
 MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, 
 MAPREDUCE-1425_y_0.20.patch


 {noformat}
 -bash-3.1$ hadoop  archive -archiveName t4.har -p . t4 .
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.compile(Pattern.java:1432)
 at java.util.regex.Pattern.init(Pattern.java:1133)
 at java.util.regex.Pattern.compile(Pattern.java:847)
 at java.lang.String.replace(String.java:2208)
 at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
 at 
 org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
 at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at 
 org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError

2010-02-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831756#action_12831756
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1425:
---

The manual test is simple: run archive on 10^5 files and jmap to read the 
memory usages as shown previously.

 archive throws OutOfMemoryError
 ---

 Key: MAPREDUCE-1425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, 
 MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, 
 MAPREDUCE-1425_y_0.20.patch


 {noformat}
 -bash-3.1$ hadoop  archive -archiveName t4.har -p . t4 .
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.compile(Pattern.java:1432)
 at java.util.regex.Pattern.init(Pattern.java:1133)
 at java.util.regex.Pattern.compile(Pattern.java:847)
 at java.lang.String.replace(String.java:2208)
 at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
 at 
 org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
 at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at 
 org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1472) JobTracker.submitJob holds a lock on the JobTracker while copying job-conf from HDFS

2010-02-09 Thread Arun C Murthy (JIRA)
JobTracker.submitJob holds a lock on the JobTracker while copying job-conf from 
HDFS


 Key: MAPREDUCE-1472
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1472
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Blocker


This could have very bad impact on responsiveness of the cluster.

JobTracker.submitJob also forks a DU and writes to it's local-disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1399) The archive command shows a null error message

2010-02-09 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-1399:
--

Attachment: m1399_20100205trunk2_y0.20.patch

m1399_20100205trunk2_y0.20.patch: for y0.20

 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, 
 m1399_20100205trunk2_y0.20.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError

2010-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831775#action_12831775
 ] 

Hadoop QA commented on MAPREDUCE-1425:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435033/MAPREDUCE-1425.patch
  against trunk revision 907967.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/console

This message is automatically generated.

 archive throws OutOfMemoryError
 ---

 Key: MAPREDUCE-1425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, 
 MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, 
 MAPREDUCE-1425_y_0.20.patch


 {noformat}
 -bash-3.1$ hadoop  archive -archiveName t4.har -p . t4 .
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.compile(Pattern.java:1432)
 at java.util.regex.Pattern.init(Pattern.java:1133)
 at java.util.regex.Pattern.compile(Pattern.java:847)
 at java.lang.String.replace(String.java:2208)
 at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
 at 
 org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
 at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at 
 org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831797#action_12831797
 ] 

Scott Chen commented on MAPREDUCE-1463:
---

@Todd: 
Yes, you're right. The logic in the patch is wrong. The one you post is the 
correct logic. Sorry about the mistake.

@Amar: 
{quote}
How do you define small jobs. Shouldnt it be based on total number of tasks 
instead of considering maps and reduces individually?
{quote}
We want to start reducer faster in both the fewer mapper and fewer reducer 
cases.
Because for fewer reducer case, starting reducer earlier is cheap anyway. And 
for fewer mapper case, the mapper finishes faster.
But I think it may not be a bad idea if we take the total instead (it is 
simpler at least). 
{quote}
Why do we need special case for small jobs? If its for fairness then this piece 
of code rightly belongs to contrib/fairscheduler, no?
If not for fairness then what is the problem with the current framework w.r.t 
small jobs?
{quote}
Handling the special case for small jobs increase the overall latency which 
gives the users better experience.
{quote}
Can be fixed by simple (configuration-like) tweaking?
If not then whats the right fix.
{quote}
For experienced users,  setting completedmaps=0 does fix this problem. But it 
will be nice if this can be automatically done for other users who do not know 
how to configure hadoop.


@Arun: 
Thanks for the comments. I agree. Tweaking 
mapreduce.job.reduce.slowstart.completedmaps in the job client side should be a 
cleaner way for this one. For experienced users, settting completedmaps to 0 in 
the client side will make their small jobs finish faster.  But it would be nice 
if some automatic decision can be done here such that the normal users don't 
have to learn how to configure an extra parameter.


The point here is that for some cases (small job, small number of mappers or 
reducers) we should not be spending time on waiting the reducers to start 
because the waiting time is significant (or it is cheap to start the reducer 
earlier). Automatically reducing the latency makes our user happy.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1463:
--

Attachment: MAPREDUCE-1463-v3.patch

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

2010-02-09 Thread eric baldeschwieler (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831800#action_12831800
 ] 

eric baldeschwieler commented on MAPREDUCE-326:
---

Sounds like we are on the same page.  Proposals will be greeted with interest.

Acceptance criteria:

1) Backwards compatible to 20 (including legacy APIs in 20 please, since we're 
still debugging the new APIs)

2) Performance neutral for 20 APIs, no large hit for legacy APIs


 The lowest level map-reduce APIs should be byte oriented
 

 Key: MAPREDUCE-326
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: eric baldeschwieler

 As discussed here:
 https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
 The templates, serializers and other complexities that allow map-reduce to 
 use arbitrary types complicate the design and lead to lots of object creates 
 and other overhead that a byte oriented design would not suffer.  I believe 
 the lowest level implementation of hadoop map-reduce should have byte string 
 oriented APIs (for keys and values).  This API would be more performant, 
 simpler and more easily cross language.
 The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1318) Document exit codes and their meanings used by linux task controller

2010-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831801#action_12831801
 ] 

Hadoop QA commented on MAPREDUCE-1318:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435095/MAPREDUCE-1318.patch
  against trunk revision 907967.

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/console

This message is automatically generated.

 Document exit codes and their meanings used by linux task controller
 

 Key: MAPREDUCE-1318
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1318
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation
Reporter: Sreekanth Ramakrishnan
Assignee: Anatoli Fomenko
Priority: Blocker
 Fix For: 0.21.0

 Attachments: HADOOP-5912.1.patch, MAPREDUCE-1318.1.patch, 
 MAPREDUCE-1318.2.patch, MAPREDUCE-1318.patch


 Currently, linux task controller binary uses a set of exit code, which is not 
 documented. These should be documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1399) The archive command shows a null error message

2010-02-09 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1399:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. thanks nicholas.

 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, 
 m1399_20100205trunk2_y0.20.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1473) Sqoop should allow users to control export parallelism

2010-02-09 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1473:
-

Attachment: MAPREDUCE-1473.patch

Attaching a patch which provides this functionality. This uses 
CombineFileInputFormat to batch up Sqoop's input files into a user-defined 
number of splits.

As in importing, the degree of parallelism is controlled with the {{\-m}} / 
{{--num-mappers}} parameters.

 Sqoop should allow users to control export parallelism
 --

 Key: MAPREDUCE-1473
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1473
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1473.patch


 Sqoop uses MapReduce jobs to export files back to a table in the database. 
 The degree of parallelism is controlled by the number of splits; i.e., the 
 number of input files used. The bottleneck in the system, though, is likely 
 to be the database itself.
 Users should have the ability to tune the number of parallel exporters being 
 used to a degree appropriate to their database deployment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1473) Sqoop should allow users to control export parallelism

2010-02-09 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1473:
-

Status: Patch Available  (was: Open)

 Sqoop should allow users to control export parallelism
 --

 Key: MAPREDUCE-1473
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1473
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1473.patch


 Sqoop uses MapReduce jobs to export files back to a table in the database. 
 The degree of parallelism is controlled by the number of splits; i.e., the 
 number of input files used. The bottleneck in the system, though, is likely 
 to be the database itself.
 Users should have the ability to tune the number of parallel exporters being 
 used to a degree appropriate to their database deployment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1474) forrest docs for achives is out of date.

2010-02-09 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1474:
-

Attachment: MAPREDUCE-1474.patch

doc changes for hadoop archives.

 forrest docs for achives is out of date.
 

 Key: MAPREDUCE-1474
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1474
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1474.patch


 The docs for archives are out of date. The new docs that were checked into 
 hadoop common were lost because of the project split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1474) forrest docs for archives is out of date.

2010-02-09 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1474:
-

Status: Patch Available  (was: Open)

 forrest docs for archives is out of date.
 -

 Key: MAPREDUCE-1474
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1474
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1474.patch


 The docs for archives are out of date. The new docs that were checked into 
 hadoop common were lost because of the project split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-434) local map-reduce job limited to single reducer

2010-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831855#action_12831855
 ] 

Hadoop QA commented on MAPREDUCE-434:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435228/MAPREDUCE-434.4.patch
  against trunk revision 908283.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/console

This message is automatically generated.

 local map-reduce job limited to single reducer
 --

 Key: MAPREDUCE-434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, 
 MAPREDUCE-434.4.patch, MAPREDUCE-434.patch


 when mapred.job.tracker is set to 'local', my setNumReduceTasks call is 
 ignored, and the number of reduce tasks is set at 1.
 This prevents me from locally debugging my partition function, which tries to 
 partition based on the number of reduce tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1433:
---

Attachment: 1433.bp20.patch

More up-to-date version of the backported patch. Not for commit.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.22.0

 Attachments: 1433.bp20.patch, 1433.bp20.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831868#action_12831868
 ] 

Hudson commented on MAPREDUCE-1433:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #233 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/233/])


 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.22.0

 Attachments: 1433.bp20.patch, 1433.bp20.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError

2010-02-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831867#action_12831867
 ] 

Hudson commented on MAPREDUCE-1425:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #233 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/233/])


 archive throws OutOfMemoryError
 ---

 Key: MAPREDUCE-1425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, 
 MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, 
 MAPREDUCE-1425_y_0.20.patch


 {noformat}
 -bash-3.1$ hadoop  archive -archiveName t4.har -p . t4 .
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.compile(Pattern.java:1432)
 at java.util.regex.Pattern.init(Pattern.java:1133)
 at java.util.regex.Pattern.compile(Pattern.java:847)
 at java.lang.String.replace(String.java:2208)
 at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
 at 
 org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
 at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at 
 org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1455) Authorization for servlets

2010-02-09 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi reassigned MAPREDUCE-1455:
---

Assignee: Ravi Gummadi

 Authorization for servlets
 --

 Key: MAPREDUCE-1455
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1455
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Ravi Gummadi
 Fix For: 0.22.0


 This jira is about building the authorization for servlets (on top of 
 MAPREDUCE-1307). That is, the JobTracker/TaskTracker runs authorization 
 checks on web requests based on the configured job permissions. For e.g., if 
 the job permission is 600, then no one except the authenticated user can look 
 at the job details via the browser. The authenticated user in the servlet can 
 be obtained using the HttpServletRequest method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1455) Authorization for servlets

2010-02-09 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831880#action_12831880
 ] 

Ravi Gummadi commented on MAPREDUCE-1455:
-

We will get authenticated user using HttpServletRequest.getRemoteUser().
I am proposing to run the methods that access the job as the user(using 
UserGroupInformation.doAs()) from JSPs and servlets sothat methods of 
JobTracker can just do authorization(by checking the 
UserGroupInformation.getCurrentUser()).
This avoids many changes in MAPREDUCE-1307 and also avoids adding new methods 
that take UGI as parameter in jobtracker.

Thoughts ?

 Authorization for servlets
 --

 Key: MAPREDUCE-1455
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1455
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Ravi Gummadi
 Fix For: 0.22.0


 This jira is about building the authorization for servlets (on top of 
 MAPREDUCE-1307). That is, the JobTracker/TaskTracker runs authorization 
 checks on web requests based on the configured job permissions. For e.g., if 
 the job permission is 600, then no one except the authenticated user can look 
 at the job details via the browser. The authenticated user in the servlet can 
 be obtained using the HttpServletRequest method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1333) Parallel running tasks on one single node may slow down the performance

2010-02-09 Thread Xing Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831884#action_12831884
 ] 

Xing Shi commented on MAPREDUCE-1333:
-

The purpose of the distribute system is high utility.

If you want to analysis running tasks performance, you can just set one node 
with one map and no reduce, Vice-versa.

 Parallel running tasks on one single node may slow down the performance
 ---

 Key: MAPREDUCE-1333
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1333
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker, task, tasktracker
Affects Versions: 0.20.1
Reporter: Zhaoning Zhang

 When I analysis running tasks performance, I found that parallel running 
 tasks on one single node will not be better performance than the serialized 
 ones.
 We can set mapred.tasktracker.{map|reduce}.tasks.maximum = 1 individually, 
 but there will be parallel map AND reduce tasks.
 And I wonder it's true in the real commercial clusters?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete

2010-02-09 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1305:
-

Attachment: M1305-1.patch

Modified Peter's patch to remove FsShell invocations.

That part isn't actually horrible, performance-wise; it reuses the instance, so 
while there's certainly avoidable overhead in parsing and whatnot, it's not 
forking a process or anything too notable. It also supports the Trash, which 
may be useful/appreciated.

Is supporting Trash useful for DistCp users running with \-delete?

 Massive performance problem with DistCp and -delete
 ---

 Key: MAPREDUCE-1305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
 Attachments: M1305-1.patch, MAPREDUCE-1305.patch


 *First problem*
 In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus 
 objects when the path is all we need.
 The performance problem comes from 
 org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries 
 to retrieve file permissions by issuing a ls -ld path which is painfully 
 slow.
 Changed that to just serialize Path and not FileStatus.
 *Second problem*
 To delete the files we invoke the hadoop command line tool with option 
 -rmr path. Again, for each file.
 Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete

2010-02-09 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831893#action_12831893
 ] 

Koji Noguchi commented on MAPREDUCE-1305:
-

bq. Is supporting Trash useful for DistCp users running with -delete?

To me, yes.
I've seen many of our users deleting their files accidentally.  
Trash has saved us great time.

I'd like to request the Trash part to stay if there's not much performance 
problem.

 Massive performance problem with DistCp and -delete
 ---

 Key: MAPREDUCE-1305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
 Attachments: M1305-1.patch, MAPREDUCE-1305.patch


 *First problem*
 In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus 
 objects when the path is all we need.
 The performance problem comes from 
 org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries 
 to retrieve file permissions by issuing a ls -ld path which is painfully 
 slow.
 Changed that to just serialize Path and not FileStatus.
 *Second problem*
 To delete the files we invoke the hadoop command line tool with option 
 -rmr path. Again, for each file.
 Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831894#action_12831894
 ] 

Amar Kamat commented on MAPREDUCE-1463:
---

What should be the behavior where total number of maps and reducers are less 
(i.e a small job for now) but takes huge amount of time to finish. For example 
the map takes a day to run while the reduces are also compute intensive. In 
such a case would we still consider the job as small job? I think what we want 
to capture is the job behavior (fast *finishing* job versus others). Using task 
counts might not be sufficient. 

Scott, wouldn't this problem be solved if you set 
'mapreduce.job.reduce.slowstart.completedmaps' to a default value of 0 (instead 
of 0.5) for all your users? 

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.