date:20100209


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

A preliminary patch

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1470) Move Delegation token into Common so that we can use it for MapReduce also

2010-02-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831368#action_12831368
 ] 

Hudson commented on MAPREDUCE-1470:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #231 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/231/])
. Move delegation tokens from HDFS to Common so that 
MapReduce can use them too. (omalley)


 Move Delegation token into Common so that we can use it for MapReduce also
 --

 Key: MAPREDUCE-1470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1470
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.22.0

 Attachments: mr-1470.patch


 We need to update one reference for map/reduce when we move the hdfs 
 delegation tokens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831371#action_12831371
 ] 

Devaraj Das commented on MAPREDUCE-1433:


And, please define the config variables in mapred-default.xml

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: m-1440.patch

Updated with a few more fixes.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: (was: m-1440.patch)

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

This time attaching the right file. *smile*

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Bump the version number of ClientProtocol

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Ok, this has an improved test and fixes a copy and paste bug.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MAPREDUCE-1307) Introduce the concept of Job Permissions

2010-02-09 Thread Vinod K V (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V reassigned MAPREDUCE-1307:


Assignee: Vinod K V

 Introduce the concept of Job Permissions
 

 Key: MAPREDUCE-1307
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Devaraj Das
Assignee: Vinod K V
 Fix For: 0.22.0

 Attachments: 1307-early-1.patch


 It would be good to define the notion of job permissions analogous to file 
 permissions. Then the JobTracker can restrict who can read (e.g. look at 
 the job page) or modify (e.g. kill) jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1307) Introduce the concept of Job Permissions

2010-02-09 Thread Vinod K V (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831402#action_12831402
]

Vinod K V commented on MAPREDUCE-1307:
--

OK.. I am going ahead with ACLs for job permissions. Here's the proposal:

Users can interact with their jobs via mapred commands, JT RPCs, JT web UI and
TT web UI. This issue only handles the authorization of RPCs and hence the
command-line clients. Authorization for web UI will be addressed by
MAPREDUCE-1455.

h4. Per-job ACLs can be set by job in JobConf during the submission.
- As of now, we will only have two per-job ACLs
-- mapreduce.job.acl-modify-job
-- mapreduce.job.acl-view-job
- Job owner has the authorization to do _anything_ with the job irrespective
of the configured ACLs.
- superuser(the user who starts the mapred cluster) and members of
supergroup(configured on JT via mapred.permissions.supergroup) have the
authorization to do _anything_ with the job irrespective of the configured ACLs.

h4. mapreduce.job.acl-modify-job
- This guards *all* the modifications w.r.t a job. This takes care of all the
following operations that come under this category:
-- killing a job
-- killing a task of a job, failing a task of a job
-- setting the priority of a job
- Each of these operations are also guarded by the per-queue level ACL
acl-administer-jobs. So a caller(other than the job-owner and the
superuser/supergroup) should have the authorization to satisfy both the
queue-level ACL and then the job-level ACL.

h4. mapreduce.job.acl-view-job
- This guards *some* of the job-views
- For now, we *only* protect APIs that can return possibly sensitive
information of the job-owner
-- job-level counters
-- task-level counters
-- task-logs displayed by TT UI and
-- job.xml showed by JT UI
(The last twowill be handled by MAPREDUCE-1455).
- The above means every other piece information of jobs is still accessible by
any other user, for e.g., JobStatus, JobProfile, list of jobs in the queue, etc.

Introduce the concept of Job Permissions

Key: MAPREDUCE-1307
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Components: security
Reporter: Devaraj Das
Fix For: 0.22.0

Attachments: 1307-early-1.patch

It would be good to define the notion of job permissions analogous to file
permissions. Then the JobTracker can restrict who can read (e.g. look at
the job page) or modify (e.g. kill) jobs.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Status: Patch Available  (was: Open)

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: (was: mr-1433.patch)

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Adds license to the test case.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: (was: mr-1433.patch)

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: (was: mr-1433.patch)

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-927) Cleanup of task-logs should happen in TaskTracker instead of the Child

2010-02-09 Thread Amareshwari Sriramadasu (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831417#action_12831417
]

Amareshwari Sriramadasu commented on MAPREDUCE-927:
---

With the current proposal, we found two things that need an answer.
# Memory footprint of the TaskTracker: Each map entry (JobID, Long) would take
about 40 bytes. If the userLogRetainsHours is configured to 7days and there are
1lakh job's tasks run by a TaskTracker in a day, the map would take up 28MB of
memory. I guess this memory footprint is fine compared to persisting the same
information to disk and reading it back and forth from disk until the directory
is removed.
# If TaskTracker is reinited/ restarted and a job completed when the
TaskTracker was down, then TaskTracker would not get a KillJobAction for the
job. Then we can keep the userlogs for default userLogRetainsHours, after the
reinit/restart.

Thoughts?

Cleanup of task-logs should happen in TaskTracker instead of the Child
--

Key: MAPREDUCE-927
URL: https://issues.apache.org/jira/browse/MAPREDUCE-927
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
Priority: Blocker
Fix For: 0.21.0

Task logs' cleanup is being done in Child now. This is undesirable atleast
for two reasons: 1) failures while cleaning up will affect the user's tasks,
and 2) the task's wall time will get affected due to operations that TT
actually should own.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

2010-02-09 Thread Milind Bhandarkar (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831537#action_12831537
]

Milind Bhandarkar commented on MAPREDUCE-326:
-

Back to a low-level binary API: the proposal here isn't to deprecate any
higher level APIs, but rather to add a new lower-level API that we can
implement both the current APIs and new APIs atop. This should in fact help
us to preserve high-level API compatibility longer, since the mapreduce
kernel will be independent of the high-level API.

+1 !!

I have always thought of hadoop MR APIs as assembly language, and gradually no
one will use it directly. The low-level APIs will be great for Pig, Hive, HBase
and other high-level languages to translate to, without making compromises for
efficiency.

The lowest level map-reduce APIs should be byte oriented

Key: MAPREDUCE-326
URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: eric baldeschwieler

As discussed here:
https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
The templates, serializers and other complexities that allow map-reduce to
use arbitrary types complicate the design and lead to lots of object creates
and other overhead that a byte oriented design would not suffer. I believe
the lowest level implementation of hadoop map-reduce should have byte string
oriented APIs (for keys and values). This API would be more performant,
simpler and more easily cross language.
The existing API could be maintained as a thin layer on top of the leaner API.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1471) FileOutputCommitter does not safely clean up it's temporary files


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831546#action_12831546
 ] 

Arun C Murthy commented on MAPREDUCE-1471:
--

Jim, all file-based output-formats check to ensure that their output-directory 
is *not* present when they start i.e. 'working_path' is owned by one and only 
one job, hence this behaviour is correct.

 FileOutputCommitter does not safely clean up it's temporary files
 -

 Key: MAPREDUCE-1471
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1471
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Jim Finnessy
   Original Estimate: 4h
  Remaining Estimate: 4h

 When the FileOutputCommitter cleans up during it's cleanupJob method, it 
 potentially deletes the temporary files of other concurrent jobs.
 Since all the temporary files for all concurrent jobs are written to 
 working_path/_temporary/ any concurrent tasks that have the same working_path 
 will remove all currently executing jobs when it removes 
 working_path/_temporary during job cleanup.
 If the file name output is guaranteed by the client application to be unique, 
 the temporary files/directories should also be guaranteed to be unique to 
 avoid this problem. Suggest modifying cleanupJob to only remove files that it 
 created itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831577#action_12831577
 ] 

Arun C Murthy commented on MAPREDUCE-1463:
--

-1

These knobs seem backwards - as both Todd and Amar have pointed out we could 
add heuristics to tweak mapreduce.job.reduce.slowstart.completedmaps 
automatically without adding more config knobs.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1307) Introduce the concept of Job Permissions

2010-02-09 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831587#action_12831587
 ] 

dhruba borthakur commented on MAPREDUCE-1307:
-

 The advantage with the file-system model is that it is really simple and 
 would handle almost all cases that we might come across. 

can somebody please explain why we are abandoning the file-system permission 
model, and going towards ACLs. Is there a particular use-case that the fs 
permission model does not address?

 Introduce the concept of Job Permissions
 

 Key: MAPREDUCE-1307
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Devaraj Das
Assignee: Vinod K V
 Fix For: 0.22.0

 Attachments: 1307-early-1.patch


 It would be good to define the notion of job permissions analogous to file 
 permissions. Then the JobTracker can restrict who can read (e.g. look at 
 the job page) or modify (e.g. kill) jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job

[
https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831593#action_12831593
]

Owen O'Malley commented on MAPREDUCE-1434:
--

It helps a more interesting use case where you have a pipeline of mapreduce
jobs and don't want the 2nd set of maps to wait until the last reduce finishes.
It would be great in job control could use this as an optimization.

You need to have a method where the application declares that all of the input
has been added. To avoid having reduces holding slots that they can't use, I'd
suggest that no reduces should be launched until the input is complete.

A timeout is also required so that if a user disappears the job is killed after
N minutes of no new input and not having the input complete.

Dynamic add input for one job
-

Key: MAPREDUCE-1434
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
Project: Hadoop Map/Reduce
Issue Type: New Feature
Environment: 0.19.0
Reporter: Xing Shi

Always we should firstly upload the data to hdfs, then we can analize the
data using hadoop mapreduce.
Sometimes, the upload process takes long time. So if we can add input during
one job, the time can be saved.
WHAT?
Client:
a) hadoop job -add-input jobId inputFormat ...
Add the input to jobid
b) hadoop job -add-input done
Tell the JobTracker, the input has been prepared over.
c) hadoop job -add-input status jobid
Show how many input the jobid has.
HOWTO?
Mainly, I think we should do three things:
1. JobClinet: here JobClient should support add input to a job, indeed,
JobClient generate the split, and submit to JobTracker.
2. JobTracker: JobTracker support addInput, and add the new tasks to the
original mapTasks. Because the uploaded data will be
processed quickly, so it also should update the scheduler to support pending
a map task till Client tells the Job input done.
3. Reducer: the reducer should also update the mapNums, so it will shuffle
right.
This is the rough idea, and I will update it .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job

[
https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831595#action_12831595
]

Arun C Murthy commented on MAPREDUCE-1434:
--

I'm sure Pig/Hive would be substantial beneficiaries... their job pipelines
would benefit a lot.

Dynamic add input for one job
-

Key: MAPREDUCE-1434
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
Project: Hadoop Map/Reduce
Issue Type: New Feature
Environment: 0.19.0
Reporter: Xing Shi

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MAPREDUCE-1403) Save file-sizes of each of the artifacts in DistributedCache in the JobConf


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-1403:


Assignee: Arun C Murthy  (was: Hong Tang)

 Save file-sizes of each of the artifacts in DistributedCache in the JobConf
 ---

 Key: MAPREDUCE-1403
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1403
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1403_yhadoop20.patch


 It would be a useful metric to collect... potentially GridMix could use it to 
 emulate jobs which use the DistributedCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1403) Save file-sizes of each of the artifacts in DistributedCache in the JobConf


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1403:
-

Attachment: MAPREDUCE-1403_yhadoop20.patch

Patch for y20 distribution. Not to be committed.

 Save file-sizes of each of the artifacts in DistributedCache in the JobConf
 ---

 Key: MAPREDUCE-1403
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1403
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1403_yhadoop20.patch


 It would be a useful metric to collect... potentially GridMix could use it to 
 emulate jobs which use the DistributedCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831608#action_12831608
 ] 

Owen O'Malley commented on MAPREDUCE-1434:
--

One approach might be to have a subclass of InputFormat, such as:

{code}
public abstract class IncrementalInputFormat extends InputFormat {
  InputSplit[] getNewInputSplits(JobContext context) throws IOException;
}
{code}

and such input formats return any new splits that they have found since the 
last time the method was called.

 Dynamic add input for one job
 -

 Key: MAPREDUCE-1434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
 Environment: 0.19.0
Reporter: Xing Shi

 Always we should firstly upload the data to hdfs, then we can analize the 
 data using hadoop mapreduce.
 Sometimes, the upload process takes long time. So if we can add input during 
 one job, the time can be saved.
 WHAT?
 Client:
 a) hadoop job -add-input jobId inputFormat ...
 Add the input to jobid
 b) hadoop job -add-input done
 Tell the JobTracker, the input has been prepared over.
 c) hadoop job -add-input status jobid
 Show how many input the jobid has.
 HOWTO?
 Mainly, I think we should do three things:
 1. JobClinet: here JobClient should support add input to a job, indeed, 
 JobClient generate the split, and submit to JobTracker.
 2. JobTracker: JobTracker support addInput, and add the new tasks to the 
 original mapTasks. Because the uploaded data will be 
 processed quickly, so it also should update the scheduler to support pending 
 a map task till Client tells the Job input done.
 3. Reducer: the reducer should also update the mapNums, so it will shuffle 
 right.
 This is the rough idea, and I will update it .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831620#action_12831620
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1399:
---

Hudson does not seem working.  Ran test-patch locally.
{noformat}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
{noformat}

 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831625#action_12831625
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1399:
---

Oops, I mistakenly posted [the test-patch 
result|https://issues.apache.org/jira/browse/MAPREDUCE-1399?focusedCommentId=12831620page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12831620]
 for MAPREDUCE-1425 to this.  Sorry ...

 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831626#action_12831626
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1425:
---

Hudson does not seem working.  Ran test-patch locally.
{noformat}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]
{noformat}


 archive throws OutOfMemoryError
 ---

 Key: MAPREDUCE-1425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, 
 MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, 
 MAPREDUCE-1425_y_0.20.patch


 {noformat}
 -bash-3.1$ hadoop  archive -archiveName t4.har -p . t4 .
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.compile(Pattern.java:1432)
 at java.util.regex.Pattern.init(Pattern.java:1133)
 at java.util.regex.Pattern.compile(Pattern.java:847)
 at java.lang.String.replace(String.java:2208)
 at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
 at 
 org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
 at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at 
 org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

2010-02-09 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831631#action_12831631
 ] 

Allen Wittenauer commented on MAPREDUCE-1266:
-

if you are using jvm reuse, then that 1s disappears, right?


 Allow heartbeat interval smaller than 3 seconds for tiny clusters
 -

 Key: MAPREDUCE-1266
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, task, tasktracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Minor

 For small clusters, the heartbeat interval has a large effect on job latency. 
 This is especially true on pseudo-distributed or other tiny (5 nodes) 
 clusters. It's not a big deal for production, but new users would have a 
 happier first experience if Hadoop seemed snappier.
 I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 
 0.5 seconds (but have it governed by an undocumented config parameter in case 
 people don't like this change). The cluster size-based ramp up of interval 
 will maintain the current scalable behavior for large clusters with no 
 negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

2010-02-09 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831644#action_12831644
]

Todd Lipcon commented on MAPREDUCE-1266:

bq. if you are using jvm reuse, then that 1s disappears, right?

Not really, since JVM reuse doesn't reuse between maps and reduces.

The time sequence of a small job looks like:

Client:
Submit job
JT:
Create tasks (initialize job) on JT
wait for a TT to heartbeat
TT:
start JVM
child:
process map task
TT:
send accelerated heartbeat once map task is complete (I forget whether this
is in 0.20 or came later)
receive reduce task, start reduce JVM (regardless of JVM reuse)
child:
process reduce task
TT:
send completion heartbeat

I guess there are also some setup/cleanup tasks going on in there as well.
Since we're talking about a hypothetical one map, one reduce, we're just
cutting down the time between initting the job and getting the first JVM on a
TT.

In a multimapper or multireducer job, the cost shows up in how long it takes
for all of the tasks to get scheduled - it will only schedule one task per
heartbeat with some schedulers. The fair scheduler after MAPREDUCE-706 can
assign multiple at the same time, which should help substantially.

Allow heartbeat interval smaller than 3 seconds for tiny clusters
-

Key: MAPREDUCE-1266
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: jobtracker, task, tasktracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Minor

For small clusters, the heartbeat interval has a large effect on job latency.
This is especially true on pseudo-distributed or other tiny (5 nodes)
clusters. It's not a big deal for production, but new users would have a
happier first experience if Hadoop seemed snappier.
I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps
0.5 seconds (but have it governed by an undocumented config parameter in case
people don't like this change). The cluster size-based ramp up of interval
will maintain the current scalable behavior for large clusters with no
negative effect.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message

[
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831653#action_12831653
]

Hadoop QA commented on MAPREDUCE-1399:
--

+1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12435047/m1399_20100205trunk2.patch
against trunk revision 907967.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 7 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/console

This message is automatically generated.

The archive command shows a null error message
--

Key: MAPREDUCE-1399
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
Fix For: 0.22.0

Attachments: m1399_20100204.patch, m1399_20100205.patch,
m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch

{noformat}
bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
Exception in archives
null
{noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Add some code that sets the service name on the received token.

All tests pass and test-patch is clean.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message

[
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831665#action_12831665
]

Hadoop QA commented on MAPREDUCE-1399:
--

+1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12435047/m1399_20100205trunk2.patch
against trunk revision 907967.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 7 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/console

This message is automatically generated.

The archive command shows a null error message
--

Attachments: m1399_20100204.patch, m1399_20100205.patch,
m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch

{noformat}
bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
Exception in archives
null
{noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831697#action_12831697
 ] 

Leonid Furman commented on MAPREDUCE-1341:
--

It looks like the hudson build hasn't picked up the latest patch - 
MAPREDUCE-1341.4.patch. Should I flip the ticket status in order to restart the 
build?

Thanks!

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-09 Thread Aaron Kimball (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831704#action_12831704
 ] 

Aaron Kimball commented on MAPREDUCE-1341:
--

I don't see the patch listed in 
http://hudson.zones.apache.org/hudson/view/Hadoop/job/Mapreduce-Patch-Admin/lastSuccessfulBuild/artifact/MAPREDUCE_PatchQueue.html
 so yea, go through cancel patch / submit patch again.

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Updated with new code to normalize the hostname.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonid Furman updated MAPREDUCE-1341:
-

Status: Open  (was: Patch Available)

Cycling patch to retrigger hudson build.

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonid Furman updated MAPREDUCE-1341:
-

Assignee: Leonid Furman
  Status: Patch Available  (was: Open)

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831710#action_12831710
 ] 

Devaraj Das commented on MAPREDUCE-1433:


Please pass the right text to setService in getDelegationToken

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

Attachment: mr-1433.patch

Ok, now the patch has the right fix in it.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831714#action_12831714
 ] 

Leonid Furman commented on MAPREDUCE-1341:
--

Aaron, it doesn't seem to populate the queue - does it usually happen 
immediately or after some time?

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831717#action_12831717
 ] 

Leonid Furman commented on MAPREDUCE-1341:
--

Never mind, it is there now. Thank you!

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831722#action_12831722
 ] 

Devaraj Das commented on MAPREDUCE-1433:


+1

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1433:
---

Attachment: 1433.bp20.patch

Patch for Y20. Not for commit.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: 1433.bp20.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-1433:
-

   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

It passes unit tests and test-patch.

 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.22.0

 Attachments: 1433.bp20.patch, mr-1433.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831753#action_12831753
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1425:
---

All tests passed except TestChainErrors, which still failed after the patch had 
been reverted.

 archive throws OutOfMemoryError
 ---

 Key: MAPREDUCE-1425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, 
 MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, 
 MAPREDUCE-1425_y_0.20.patch


 {noformat}
 -bash-3.1$ hadoop  archive -archiveName t4.har -p . t4 .
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.compile(Pattern.java:1432)
 at java.util.regex.Pattern.init(Pattern.java:1133)
 at java.util.regex.Pattern.compile(Pattern.java:847)
 at java.lang.String.replace(String.java:2208)
 at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
 at 
 org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
 at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at 
 org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831756#action_12831756
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1425:
---

The manual test is simple: run archive on 10^5 files and jmap to read the 
memory usages as shown previously.

 archive throws OutOfMemoryError
 ---

 Key: MAPREDUCE-1425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, 
 MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, 
 MAPREDUCE-1425_y_0.20.patch


 {noformat}
 -bash-3.1$ hadoop  archive -archiveName t4.har -p . t4 .
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.compile(Pattern.java:1432)
 at java.util.regex.Pattern.init(Pattern.java:1133)
 at java.util.regex.Pattern.compile(Pattern.java:847)
 at java.lang.String.replace(String.java:2208)
 at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
 at 
 org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
 at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at 
 org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1472) JobTracker.submitJob holds a lock on the JobTracker while copying job-conf from HDFS

JobTracker.submitJob holds a lock on the JobTracker while copying job-conf from 
HDFS


 Key: MAPREDUCE-1472
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1472
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Blocker


This could have very bad impact on responsiveness of the cluster.

JobTracker.submitJob also forks a DU and writes to it's local-disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1399) The archive command shows a null error message


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-1399:
--

Attachment: m1399_20100205trunk2_y0.20.patch

m1399_20100205trunk2_y0.20.patch: for y0.20

 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, 
 m1399_20100205trunk2_y0.20.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError

[
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831775#action_12831775
]

Hadoop QA commented on MAPREDUCE-1425:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12435033/MAPREDUCE-1425.patch
against trunk revision 907967.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/console

This message is automatically generated.

archive throws OutOfMemoryError
---

Key: MAPREDUCE-1425
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
Fix For: 0.22.0

Attachments: har.sh, m1425_20100129TextFileGenerator.patch,
MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch,
MAPREDUCE-1425_y_0.20.patch

{noformat}
-bash-3.1$ hadoop archive -archiveName t4.har -p . t4 .
Exception in thread main java.lang.OutOfMemoryError: Java heap space
at java.util.regex.Pattern.compile(Pattern.java:1432)
at java.util.regex.Pattern.init(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at java.lang.String.replace(String.java:2208)
at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)
at org.apache.hadoop.fs.Path.init(Path.java:126)
at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
at
org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
at
org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
{noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Scott Chen (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831797#action_12831797
]

Scott Chen commented on MAPREDUCE-1463:
---

@Todd:
Yes, you're right. The logic in the patch is wrong. The one you post is the
correct logic. Sorry about the mistake.

@Amar:
{quote}
How do you define small jobs. Shouldnt it be based on total number of tasks
instead of considering maps and reduces individually?
{quote}
We want to start reducer faster in both the fewer mapper and fewer reducer
cases.
Because for fewer reducer case, starting reducer earlier is cheap anyway. And
for fewer mapper case, the mapper finishes faster.
But I think it may not be a bad idea if we take the total instead (it is
simpler at least).
{quote}
Why do we need special case for small jobs? If its for fairness then this piece
of code rightly belongs to contrib/fairscheduler, no?
If not for fairness then what is the problem with the current framework w.r.t
small jobs?
{quote}
Handling the special case for small jobs increase the overall latency which
gives the users better experience.
{quote}
Can be fixed by simple (configuration-like) tweaking?
If not then whats the right fix.
{quote}
For experienced users, setting completedmaps=0 does fix this problem. But it
will be nice if this can be automatically done for other users who do not know
how to configure hadoop.

@Arun:
Thanks for the comments. I agree. Tweaking
mapreduce.job.reduce.slowstart.completedmaps in the job client side should be a
cleaner way for this one. For experienced users, settting completedmaps to 0 in
the client side will make their small jobs finish faster. But it would be nice
if some automatic decision can be done here such that the normal users don't
have to learn how to configure an extra parameter.

The point here is that for some cases (small job, small number of mappers or
reducers) we should not be spending time on waiting the reducers to start
because the waiting time is significant (or it is cheap to start the reducer
earlier). Automatically reducing the latency makes our user happy.

Reducer should start faster for smaller jobs

Key: MAPREDUCE-1463
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch

Our users often complain about the slowness of smaller ad-hoc jobs.
The overhead to wait for the reducers to start in this case is significant.
It will be good if we can start the reducer sooner in this case.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Scott Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1463:
--

Attachment: MAPREDUCE-1463-v3.patch

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

2010-02-09 Thread eric baldeschwieler (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831800#action_12831800
 ] 

eric baldeschwieler commented on MAPREDUCE-326:
---

Sounds like we are on the same page.  Proposals will be greeted with interest.

Acceptance criteria:

1) Backwards compatible to 20 (including legacy APIs in 20 please, since we're 
still debugging the new APIs)

2) Performance neutral for 20 APIs, no large hit for legacy APIs


 The lowest level map-reduce APIs should be byte oriented
 

 Key: MAPREDUCE-326
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: eric baldeschwieler

 As discussed here:
 https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
 The templates, serializers and other complexities that allow map-reduce to 
 use arbitrary types complicate the design and lead to lots of object creates 
 and other overhead that a byte oriented design would not suffer.  I believe 
 the lowest level implementation of hadoop map-reduce should have byte string 
 oriented APIs (for keys and values).  This API would be more performant, 
 simpler and more easily cross language.
 The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1318) Document exit codes and their meanings used by linux task controller

[
https://issues.apache.org/jira/browse/MAPREDUCE-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831801#action_12831801
]

Hadoop QA commented on MAPREDUCE-1318:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12435095/MAPREDUCE-1318.patch
against trunk revision 907967.

+1 @author. The patch does not contain any @author tags.

+0 tests included. The patch appears to be a documentation patch that
doesn't require tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/console

This message is automatically generated.

Document exit codes and their meanings used by linux task controller

Key: MAPREDUCE-1318
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1318
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: documentation
Reporter: Sreekanth Ramakrishnan
Assignee: Anatoli Fomenko
Priority: Blocker
Fix For: 0.21.0

Attachments: HADOOP-5912.1.patch, MAPREDUCE-1318.1.patch,
MAPREDUCE-1318.2.patch, MAPREDUCE-1318.patch

Currently, linux task controller binary uses a set of exit code, which is not
documented. These should be documented.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1399) The archive command shows a null error message

2010-02-09 Thread Mahadev konar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1399:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. thanks nicholas.

 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, 
 m1399_20100205trunk2_y0.20.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1473) Sqoop should allow users to control export parallelism

2010-02-09 Thread Aaron Kimball (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1473:
-

Attachment: MAPREDUCE-1473.patch

Attaching a patch which provides this functionality. This uses 
CombineFileInputFormat to batch up Sqoop's input files into a user-defined 
number of splits.

As in importing, the degree of parallelism is controlled with the {{\-m}} / 
{{--num-mappers}} parameters.

 Sqoop should allow users to control export parallelism
 --

 Key: MAPREDUCE-1473
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1473
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1473.patch


 Sqoop uses MapReduce jobs to export files back to a table in the database. 
 The degree of parallelism is controlled by the number of splits; i.e., the 
 number of input files used. The bottleneck in the system, though, is likely 
 to be the database itself.
 Users should have the ability to tune the number of parallel exporters being 
 used to a degree appropriate to their database deployment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1473) Sqoop should allow users to control export parallelism

2010-02-09 Thread Aaron Kimball (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1473:
-

Status: Patch Available  (was: Open)

 Sqoop should allow users to control export parallelism
 --

 Key: MAPREDUCE-1473
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1473
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1473.patch


 Sqoop uses MapReduce jobs to export files back to a table in the database. 
 The degree of parallelism is controlled by the number of splits; i.e., the 
 number of input files used. The bottleneck in the system, though, is likely 
 to be the database itself.
 Users should have the ability to tune the number of parallel exporters being 
 used to a degree appropriate to their database deployment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1474) forrest docs for achives is out of date.

2010-02-09 Thread Mahadev konar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1474:
-

Attachment: MAPREDUCE-1474.patch

doc changes for hadoop archives.

 forrest docs for achives is out of date.
 

 Key: MAPREDUCE-1474
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1474
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1474.patch


 The docs for archives are out of date. The new docs that were checked into 
 hadoop common were lost because of the project split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1474) forrest docs for archives is out of date.

2010-02-09 Thread Mahadev konar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1474:
-

Status: Patch Available  (was: Open)

 forrest docs for archives is out of date.
 -

 Key: MAPREDUCE-1474
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1474
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1474.patch


 The docs for archives are out of date. The new docs that were checked into 
 hadoop common were lost because of the project split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-434) local map-reduce job limited to single reducer

[
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831855#action_12831855
]

Hadoop QA commented on MAPREDUCE-434:
-

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12435228/MAPREDUCE-434.4.patch
against trunk revision 908283.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/console

This message is automatically generated.

local map-reduce job limited to single reducer
--

Key: MAPREDUCE-434
URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
Project: Hadoop Map/Reduce
Issue Type: Bug
Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch,
MAPREDUCE-434.4.patch, MAPREDUCE-434.patch

when mapred.job.tracker is set to 'local', my setNumReduceTasks call is
ignored, and the number of reduce tasks is set at 1.
This prevents me from locally debugging my partition function, which tries to
partition based on the number of reduce tasks.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce