[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: mr-1433.patch A preliminary patch Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1470) Move Delegation token into Common so that we can use it for MapReduce also
[ https://issues.apache.org/jira/browse/MAPREDUCE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831368#action_12831368 ] Hudson commented on MAPREDUCE-1470: --- Integrated in Hadoop-Mapreduce-trunk-Commit #231 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/231/]) . Move delegation tokens from HDFS to Common so that MapReduce can use them too. (omalley) Move Delegation token into Common so that we can use it for MapReduce also -- Key: MAPREDUCE-1470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1470 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.22.0 Attachments: mr-1470.patch We need to update one reference for map/reduce when we move the hdfs delegation tokens. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831371#action_12831371 ] Devaraj Das commented on MAPREDUCE-1433: And, please define the config variables in mapred-default.xml Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: m-1440.patch Updated with a few more fixes. Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: (was: m-1440.patch) Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: mr-1433.patch This time attaching the right file. *smile* Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: mr-1433.patch Bump the version number of ClientProtocol Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: mr-1433.patch Ok, this has an improved test and fixes a copy and paste bug. Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1307) Introduce the concept of Job Permissions
[ https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V reassigned MAPREDUCE-1307: Assignee: Vinod K V Introduce the concept of Job Permissions Key: MAPREDUCE-1307 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Devaraj Das Assignee: Vinod K V Fix For: 0.22.0 Attachments: 1307-early-1.patch It would be good to define the notion of job permissions analogous to file permissions. Then the JobTracker can restrict who can read (e.g. look at the job page) or modify (e.g. kill) jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1307) Introduce the concept of Job Permissions
[ https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831402#action_12831402 ] Vinod K V commented on MAPREDUCE-1307: -- OK.. I am going ahead with ACLs for job permissions. Here's the proposal: Users can interact with their jobs via mapred commands, JT RPCs, JT web UI and TT web UI. This issue only handles the authorization of RPCs and hence the command-line clients. Authorization for web UI will be addressed by MAPREDUCE-1455. h4. Per-job ACLs can be set by job in JobConf during the submission. - As of now, we will only have two per-job ACLs -- mapreduce.job.acl-modify-job -- mapreduce.job.acl-view-job - Job owner has the authorization to do _anything_ with the job irrespective of the configured ACLs. - superuser(the user who starts the mapred cluster) and members of supergroup(configured on JT via mapred.permissions.supergroup) have the authorization to do _anything_ with the job irrespective of the configured ACLs. h4. mapreduce.job.acl-modify-job - This guards *all* the modifications w.r.t a job. This takes care of all the following operations that come under this category: -- killing a job -- killing a task of a job, failing a task of a job -- setting the priority of a job - Each of these operations are also guarded by the per-queue level ACL acl-administer-jobs. So a caller(other than the job-owner and the superuser/supergroup) should have the authorization to satisfy both the queue-level ACL and then the job-level ACL. h4. mapreduce.job.acl-view-job - This guards *some* of the job-views - For now, we *only* protect APIs that can return possibly sensitive information of the job-owner -- job-level counters -- task-level counters -- task-logs displayed by TT UI and -- job.xml showed by JT UI (The last twowill be handled by MAPREDUCE-1455). - The above means every other piece information of jobs is still accessible by any other user, for e.g., JobStatus, JobProfile, list of jobs in the queue, etc. Introduce the concept of Job Permissions Key: MAPREDUCE-1307 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Devaraj Das Fix For: 0.22.0 Attachments: 1307-early-1.patch It would be good to define the notion of job permissions analogous to file permissions. Then the JobTracker can restrict who can read (e.g. look at the job page) or modify (e.g. kill) jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Status: Patch Available (was: Open) Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: (was: mr-1433.patch) Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: mr-1433.patch Adds license to the test case. Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: (was: mr-1433.patch) Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: (was: mr-1433.patch) Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-927) Cleanup of task-logs should happen in TaskTracker instead of the Child
[ https://issues.apache.org/jira/browse/MAPREDUCE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831417#action_12831417 ] Amareshwari Sriramadasu commented on MAPREDUCE-927: --- With the current proposal, we found two things that need an answer. # Memory footprint of the TaskTracker: Each map entry (JobID, Long) would take about 40 bytes. If the userLogRetainsHours is configured to 7days and there are 1lakh job's tasks run by a TaskTracker in a day, the map would take up 28MB of memory. I guess this memory footprint is fine compared to persisting the same information to disk and reading it back and forth from disk until the directory is removed. # If TaskTracker is reinited/ restarted and a job completed when the TaskTracker was down, then TaskTracker would not get a KillJobAction for the job. Then we can keep the userlogs for default userLogRetainsHours, after the reinit/restart. Thoughts? Cleanup of task-logs should happen in TaskTracker instead of the Child -- Key: MAPREDUCE-927 URL: https://issues.apache.org/jira/browse/MAPREDUCE-927 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security, tasktracker Affects Versions: 0.21.0 Reporter: Vinod K V Assignee: Amareshwari Sriramadasu Priority: Blocker Fix For: 0.21.0 Task logs' cleanup is being done in Child now. This is undesirable atleast for two reasons: 1) failures while cleaning up will affect the user's tasks, and 2) the task's wall time will get affected due to operations that TT actually should own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented
[ https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831537#action_12831537 ] Milind Bhandarkar commented on MAPREDUCE-326: - Back to a low-level binary API: the proposal here isn't to deprecate any higher level APIs, but rather to add a new lower-level API that we can implement both the current APIs and new APIs atop. This should in fact help us to preserve high-level API compatibility longer, since the mapreduce kernel will be independent of the high-level API. +1 !! I have always thought of hadoop MR APIs as assembly language, and gradually no one will use it directly. The low-level APIs will be great for Pig, Hive, HBase and other high-level languages to translate to, without making compromises for efficiency. The lowest level map-reduce APIs should be byte oriented Key: MAPREDUCE-326 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: eric baldeschwieler As discussed here: https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237 The templates, serializers and other complexities that allow map-reduce to use arbitrary types complicate the design and lead to lots of object creates and other overhead that a byte oriented design would not suffer. I believe the lowest level implementation of hadoop map-reduce should have byte string oriented APIs (for keys and values). This API would be more performant, simpler and more easily cross language. The existing API could be maintained as a thin layer on top of the leaner API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1471) FileOutputCommitter does not safely clean up it's temporary files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831546#action_12831546 ] Arun C Murthy commented on MAPREDUCE-1471: -- Jim, all file-based output-formats check to ensure that their output-directory is *not* present when they start i.e. 'working_path' is owned by one and only one job, hence this behaviour is correct. FileOutputCommitter does not safely clean up it's temporary files - Key: MAPREDUCE-1471 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1471 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: Jim Finnessy Original Estimate: 4h Remaining Estimate: 4h When the FileOutputCommitter cleans up during it's cleanupJob method, it potentially deletes the temporary files of other concurrent jobs. Since all the temporary files for all concurrent jobs are written to working_path/_temporary/ any concurrent tasks that have the same working_path will remove all currently executing jobs when it removes working_path/_temporary during job cleanup. If the file name output is guaranteed by the client application to be unique, the temporary files/directories should also be guaranteed to be unique to avoid this problem. Suggest modifying cleanupJob to only remove files that it created itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831577#action_12831577 ] Arun C Murthy commented on MAPREDUCE-1463: -- -1 These knobs seem backwards - as both Todd and Amar have pointed out we could add heuristics to tweak mapreduce.job.reduce.slowstart.completedmaps automatically without adding more config knobs. Reducer should start faster for smaller jobs Key: MAPREDUCE-1463 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Reporter: Scott Chen Assignee: Scott Chen Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch Our users often complain about the slowness of smaller ad-hoc jobs. The overhead to wait for the reducers to start in this case is significant. It will be good if we can start the reducer sooner in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1307) Introduce the concept of Job Permissions
[ https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831587#action_12831587 ] dhruba borthakur commented on MAPREDUCE-1307: - The advantage with the file-system model is that it is really simple and would handle almost all cases that we might come across. can somebody please explain why we are abandoning the file-system permission model, and going towards ACLs. Is there a particular use-case that the fs permission model does not address? Introduce the concept of Job Permissions Key: MAPREDUCE-1307 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Devaraj Das Assignee: Vinod K V Fix For: 0.22.0 Attachments: 1307-early-1.patch It would be good to define the notion of job permissions analogous to file permissions. Then the JobTracker can restrict who can read (e.g. look at the job page) or modify (e.g. kill) jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job
[ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831593#action_12831593 ] Owen O'Malley commented on MAPREDUCE-1434: -- +1 It helps a more interesting use case where you have a pipeline of mapreduce jobs and don't want the 2nd set of maps to wait until the last reduce finishes. It would be great in job control could use this as an optimization. You need to have a method where the application declares that all of the input has been added. To avoid having reduces holding slots that they can't use, I'd suggest that no reduces should be launched until the input is complete. A timeout is also required so that if a user disappears the job is killed after N minutes of no new input and not having the input complete. Dynamic add input for one job - Key: MAPREDUCE-1434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434 Project: Hadoop Map/Reduce Issue Type: New Feature Environment: 0.19.0 Reporter: Xing Shi Always we should firstly upload the data to hdfs, then we can analize the data using hadoop mapreduce. Sometimes, the upload process takes long time. So if we can add input during one job, the time can be saved. WHAT? Client: a) hadoop job -add-input jobId inputFormat ... Add the input to jobid b) hadoop job -add-input done Tell the JobTracker, the input has been prepared over. c) hadoop job -add-input status jobid Show how many input the jobid has. HOWTO? Mainly, I think we should do three things: 1. JobClinet: here JobClient should support add input to a job, indeed, JobClient generate the split, and submit to JobTracker. 2. JobTracker: JobTracker support addInput, and add the new tasks to the original mapTasks. Because the uploaded data will be processed quickly, so it also should update the scheduler to support pending a map task till Client tells the Job input done. 3. Reducer: the reducer should also update the mapNums, so it will shuffle right. This is the rough idea, and I will update it . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job
[ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831595#action_12831595 ] Arun C Murthy commented on MAPREDUCE-1434: -- +1 I'm sure Pig/Hive would be substantial beneficiaries... their job pipelines would benefit a lot. Dynamic add input for one job - Key: MAPREDUCE-1434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434 Project: Hadoop Map/Reduce Issue Type: New Feature Environment: 0.19.0 Reporter: Xing Shi Always we should firstly upload the data to hdfs, then we can analize the data using hadoop mapreduce. Sometimes, the upload process takes long time. So if we can add input during one job, the time can be saved. WHAT? Client: a) hadoop job -add-input jobId inputFormat ... Add the input to jobid b) hadoop job -add-input done Tell the JobTracker, the input has been prepared over. c) hadoop job -add-input status jobid Show how many input the jobid has. HOWTO? Mainly, I think we should do three things: 1. JobClinet: here JobClient should support add input to a job, indeed, JobClient generate the split, and submit to JobTracker. 2. JobTracker: JobTracker support addInput, and add the new tasks to the original mapTasks. Because the uploaded data will be processed quickly, so it also should update the scheduler to support pending a map task till Client tells the Job input done. 3. Reducer: the reducer should also update the mapNums, so it will shuffle right. This is the rough idea, and I will update it . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1403) Save file-sizes of each of the artifacts in DistributedCache in the JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reassigned MAPREDUCE-1403: Assignee: Arun C Murthy (was: Hong Tang) Save file-sizes of each of the artifacts in DistributedCache in the JobConf --- Key: MAPREDUCE-1403 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1403 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 0.22.0 Attachments: MAPREDUCE-1403_yhadoop20.patch It would be a useful metric to collect... potentially GridMix could use it to emulate jobs which use the DistributedCache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1403) Save file-sizes of each of the artifacts in DistributedCache in the JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1403: - Attachment: MAPREDUCE-1403_yhadoop20.patch Patch for y20 distribution. Not to be committed. Save file-sizes of each of the artifacts in DistributedCache in the JobConf --- Key: MAPREDUCE-1403 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1403 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 0.22.0 Attachments: MAPREDUCE-1403_yhadoop20.patch It would be a useful metric to collect... potentially GridMix could use it to emulate jobs which use the DistributedCache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job
[ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831608#action_12831608 ] Owen O'Malley commented on MAPREDUCE-1434: -- One approach might be to have a subclass of InputFormat, such as: {code} public abstract class IncrementalInputFormat extends InputFormat { InputSplit[] getNewInputSplits(JobContext context) throws IOException; } {code} and such input formats return any new splits that they have found since the last time the method was called. Dynamic add input for one job - Key: MAPREDUCE-1434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434 Project: Hadoop Map/Reduce Issue Type: New Feature Environment: 0.19.0 Reporter: Xing Shi Always we should firstly upload the data to hdfs, then we can analize the data using hadoop mapreduce. Sometimes, the upload process takes long time. So if we can add input during one job, the time can be saved. WHAT? Client: a) hadoop job -add-input jobId inputFormat ... Add the input to jobid b) hadoop job -add-input done Tell the JobTracker, the input has been prepared over. c) hadoop job -add-input status jobid Show how many input the jobid has. HOWTO? Mainly, I think we should do three things: 1. JobClinet: here JobClient should support add input to a job, indeed, JobClient generate the split, and submit to JobTracker. 2. JobTracker: JobTracker support addInput, and add the new tasks to the original mapTasks. Because the uploaded data will be processed quickly, so it also should update the scheduler to support pending a map task till Client tells the Job input done. 3. Reducer: the reducer should also update the mapNums, so it will shuffle right. This is the rough idea, and I will update it . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831620#action_12831620 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1399: --- Hudson does not seem working. Ran test-patch locally. {noformat} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] {noformat} The archive command shows a null error message -- Key: MAPREDUCE-1399 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399 Project: Hadoop Map/Reduce Issue Type: Bug Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: m1399_20100204.patch, m1399_20100205.patch, m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch {noformat} bash-3.1$ hadoop archive -archiveName foo.har -p . foo . Exception in archives null {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831625#action_12831625 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1399: --- Oops, I mistakenly posted [the test-patch result|https://issues.apache.org/jira/browse/MAPREDUCE-1399?focusedCommentId=12831620page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12831620] for MAPREDUCE-1425 to this. Sorry ... The archive command shows a null error message -- Key: MAPREDUCE-1399 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399 Project: Hadoop Map/Reduce Issue Type: Bug Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: m1399_20100204.patch, m1399_20100205.patch, m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch {noformat} bash-3.1$ hadoop archive -archiveName foo.har -p . foo . Exception in archives null {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError
[ https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831626#action_12831626 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1425: --- Hudson does not seem working. Ran test-patch locally. {noformat} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] {noformat} archive throws OutOfMemoryError --- Key: MAPREDUCE-1425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425 Project: Hadoop Map/Reduce Issue Type: Improvement Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425_y_0.20.patch {noformat} -bash-3.1$ hadoop archive -archiveName t4.har -p . t4 . Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1432) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at java.lang.String.replace(String.java:2208) at org.apache.hadoop.fs.Path.normalizePath(Path.java:146) at org.apache.hadoop.fs.Path.initialize(Path.java:137) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.makeQualified(Path.java:296) at org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256) at org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393) at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters
[ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831631#action_12831631 ] Allen Wittenauer commented on MAPREDUCE-1266: - if you are using jvm reuse, then that 1s disappears, right? Allow heartbeat interval smaller than 3 seconds for tiny clusters - Key: MAPREDUCE-1266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker, task, tasktracker Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Minor For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other tiny (5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier. I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters
[ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831644#action_12831644 ] Todd Lipcon commented on MAPREDUCE-1266: bq. if you are using jvm reuse, then that 1s disappears, right? Not really, since JVM reuse doesn't reuse between maps and reduces. The time sequence of a small job looks like: Client: Submit job JT: Create tasks (initialize job) on JT wait for a TT to heartbeat TT: start JVM child: process map task TT: send accelerated heartbeat once map task is complete (I forget whether this is in 0.20 or came later) receive reduce task, start reduce JVM (regardless of JVM reuse) child: process reduce task TT: send completion heartbeat I guess there are also some setup/cleanup tasks going on in there as well. Since we're talking about a hypothetical one map, one reduce, we're just cutting down the time between initting the job and getting the first JVM on a TT. In a multimapper or multireducer job, the cost shows up in how long it takes for all of the tasks to get scheduled - it will only schedule one task per heartbeat with some schedulers. The fair scheduler after MAPREDUCE-706 can assign multiple at the same time, which should help substantially. Allow heartbeat interval smaller than 3 seconds for tiny clusters - Key: MAPREDUCE-1266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker, task, tasktracker Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Minor For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other tiny (5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier. I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831653#action_12831653 ] Hadoop QA commented on MAPREDUCE-1399: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435047/m1399_20100205trunk2.patch against trunk revision 907967. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/304/console This message is automatically generated. The archive command shows a null error message -- Key: MAPREDUCE-1399 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399 Project: Hadoop Map/Reduce Issue Type: Bug Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: m1399_20100204.patch, m1399_20100205.patch, m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch {noformat} bash-3.1$ hadoop archive -archiveName foo.har -p . foo . Exception in archives null {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: mr-1433.patch Add some code that sets the service name on the received token. All tests pass and test-patch is clean. Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831665#action_12831665 ] Hadoop QA commented on MAPREDUCE-1399: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435047/m1399_20100205trunk2.patch against trunk revision 907967. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/438/console This message is automatically generated. The archive command shows a null error message -- Key: MAPREDUCE-1399 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399 Project: Hadoop Map/Reduce Issue Type: Bug Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: m1399_20100204.patch, m1399_20100205.patch, m1399_20100205trunk.patch, m1399_20100205trunk2.patch, MAPREDUCE-1399.patch {noformat} bash-3.1$ hadoop archive -archiveName foo.har -p . foo . Exception in archives null {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step
[ https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831697#action_12831697 ] Leonid Furman commented on MAPREDUCE-1341: -- It looks like the hudson build hasn't picked up the latest patch - MAPREDUCE-1341.4.patch. Should I flip the ticket status in order to restart the build? Thanks! Sqoop should have an option to create hive tables and skip the table import step Key: MAPREDUCE-1341 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/sqoop Affects Versions: 0.22.0 Reporter: Leonid Furman Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter: --hive-create-only which would omit the time consuming table import step, generate hive create table statements and run them. If this feature seems useful, I can generate the patch. I have modified the Sqoop code and built it on my development machine, and it seems to be working well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step
[ https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831704#action_12831704 ] Aaron Kimball commented on MAPREDUCE-1341: -- I don't see the patch listed in http://hudson.zones.apache.org/hudson/view/Hadoop/job/Mapreduce-Patch-Admin/lastSuccessfulBuild/artifact/MAPREDUCE_PatchQueue.html so yea, go through cancel patch / submit patch again. Sqoop should have an option to create hive tables and skip the table import step Key: MAPREDUCE-1341 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/sqoop Affects Versions: 0.22.0 Reporter: Leonid Furman Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter: --hive-create-only which would omit the time consuming table import step, generate hive create table statements and run them. If this feature seems useful, I can generate the patch. I have modified the Sqoop code and built it on my development machine, and it seems to be working well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: mr-1433.patch Updated with new code to normalize the hostname. Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step
[ https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leonid Furman updated MAPREDUCE-1341: - Status: Open (was: Patch Available) Cycling patch to retrigger hudson build. Sqoop should have an option to create hive tables and skip the table import step Key: MAPREDUCE-1341 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/sqoop Affects Versions: 0.22.0 Reporter: Leonid Furman Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter: --hive-create-only which would omit the time consuming table import step, generate hive create table statements and run them. If this feature seems useful, I can generate the patch. I have modified the Sqoop code and built it on my development machine, and it seems to be working well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step
[ https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leonid Furman updated MAPREDUCE-1341: - Assignee: Leonid Furman Status: Patch Available (was: Open) Sqoop should have an option to create hive tables and skip the table import step Key: MAPREDUCE-1341 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/sqoop Affects Versions: 0.22.0 Reporter: Leonid Furman Assignee: Leonid Furman Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, MAPREDUCE-1341.4.patch, MAPREDUCE-1341.patch In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter: --hive-create-only which would omit the time consuming table import step, generate hive create table statements and run them. If this feature seems useful, I can generate the patch. I have modified the Sqoop code and built it on my development machine, and it seems to be working well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831710#action_12831710 ] Devaraj Das commented on MAPREDUCE-1433: Please pass the right text to setService in getDelegationToken Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Attachment: mr-1433.patch Ok, now the patch has the right fix in it. Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step
[ https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831714#action_12831714 ] Leonid Furman commented on MAPREDUCE-1341: -- Aaron, it doesn't seem to populate the queue - does it usually happen immediately or after some time? Sqoop should have an option to create hive tables and skip the table import step Key: MAPREDUCE-1341 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/sqoop Affects Versions: 0.22.0 Reporter: Leonid Furman Assignee: Leonid Furman Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.patch In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter: --hive-create-only which would omit the time consuming table import step, generate hive create table statements and run them. If this feature seems useful, I can generate the patch. I have modified the Sqoop code and built it on my development machine, and it seems to be working well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step
[ https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831717#action_12831717 ] Leonid Furman commented on MAPREDUCE-1341: -- Never mind, it is there now. Thank you! Sqoop should have an option to create hive tables and skip the table import step Key: MAPREDUCE-1341 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/sqoop Affects Versions: 0.22.0 Reporter: Leonid Furman Assignee: Leonid Furman Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.patch In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter: --hive-create-only which would omit the time consuming table import step, generate hive create table statements and run them. If this feature seems useful, I can generate the patch. I have modified the Sqoop code and built it on my development machine, and it seems to be working well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831722#action_12831722 ] Devaraj Das commented on MAPREDUCE-1433: +1 Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-1433: --- Attachment: 1433.bp20.patch Patch for Y20. Not for commit. Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: 1433.bp20.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated MAPREDUCE-1433: - Resolution: Fixed Fix Version/s: 0.22.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) It passes unit tests and test-patch. Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.22.0 Attachments: 1433.bp20.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError
[ https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831753#action_12831753 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1425: --- All tests passed except TestChainErrors, which still failed after the patch had been reverted. archive throws OutOfMemoryError --- Key: MAPREDUCE-1425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425 Project: Hadoop Map/Reduce Issue Type: Improvement Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425_y_0.20.patch {noformat} -bash-3.1$ hadoop archive -archiveName t4.har -p . t4 . Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1432) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at java.lang.String.replace(String.java:2208) at org.apache.hadoop.fs.Path.normalizePath(Path.java:146) at org.apache.hadoop.fs.Path.initialize(Path.java:137) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.makeQualified(Path.java:296) at org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256) at org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393) at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError
[ https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831756#action_12831756 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1425: --- The manual test is simple: run archive on 10^5 files and jmap to read the memory usages as shown previously. archive throws OutOfMemoryError --- Key: MAPREDUCE-1425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425 Project: Hadoop Map/Reduce Issue Type: Improvement Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425_y_0.20.patch {noformat} -bash-3.1$ hadoop archive -archiveName t4.har -p . t4 . Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1432) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at java.lang.String.replace(String.java:2208) at org.apache.hadoop.fs.Path.normalizePath(Path.java:146) at org.apache.hadoop.fs.Path.initialize(Path.java:137) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.makeQualified(Path.java:296) at org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256) at org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393) at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1472) JobTracker.submitJob holds a lock on the JobTracker while copying job-conf from HDFS
JobTracker.submitJob holds a lock on the JobTracker while copying job-conf from HDFS Key: MAPREDUCE-1472 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1472 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Arun C Murthy Assignee: Arun C Murthy Priority: Blocker This could have very bad impact on responsiveness of the cluster. JobTracker.submitJob also forks a DU and writes to it's local-disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1399) The archive command shows a null error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-1399: -- Attachment: m1399_20100205trunk2_y0.20.patch m1399_20100205trunk2_y0.20.patch: for y0.20 The archive command shows a null error message -- Key: MAPREDUCE-1399 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399 Project: Hadoop Map/Reduce Issue Type: Bug Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: m1399_20100204.patch, m1399_20100205.patch, m1399_20100205trunk.patch, m1399_20100205trunk2.patch, m1399_20100205trunk2_y0.20.patch, MAPREDUCE-1399.patch {noformat} bash-3.1$ hadoop archive -archiveName foo.har -p . foo . Exception in archives null {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError
[ https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831775#action_12831775 ] Hadoop QA commented on MAPREDUCE-1425: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435033/MAPREDUCE-1425.patch against trunk revision 907967. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/439/console This message is automatically generated. archive throws OutOfMemoryError --- Key: MAPREDUCE-1425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425 Project: Hadoop Map/Reduce Issue Type: Improvement Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425_y_0.20.patch {noformat} -bash-3.1$ hadoop archive -archiveName t4.har -p . t4 . Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1432) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at java.lang.String.replace(String.java:2208) at org.apache.hadoop.fs.Path.normalizePath(Path.java:146) at org.apache.hadoop.fs.Path.initialize(Path.java:137) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.makeQualified(Path.java:296) at org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256) at org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393) at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831797#action_12831797 ] Scott Chen commented on MAPREDUCE-1463: --- @Todd: Yes, you're right. The logic in the patch is wrong. The one you post is the correct logic. Sorry about the mistake. @Amar: {quote} How do you define small jobs. Shouldnt it be based on total number of tasks instead of considering maps and reduces individually? {quote} We want to start reducer faster in both the fewer mapper and fewer reducer cases. Because for fewer reducer case, starting reducer earlier is cheap anyway. And for fewer mapper case, the mapper finishes faster. But I think it may not be a bad idea if we take the total instead (it is simpler at least). {quote} Why do we need special case for small jobs? If its for fairness then this piece of code rightly belongs to contrib/fairscheduler, no? If not for fairness then what is the problem with the current framework w.r.t small jobs? {quote} Handling the special case for small jobs increase the overall latency which gives the users better experience. {quote} Can be fixed by simple (configuration-like) tweaking? If not then whats the right fix. {quote} For experienced users, setting completedmaps=0 does fix this problem. But it will be nice if this can be automatically done for other users who do not know how to configure hadoop. @Arun: Thanks for the comments. I agree. Tweaking mapreduce.job.reduce.slowstart.completedmaps in the job client side should be a cleaner way for this one. For experienced users, settting completedmaps to 0 in the client side will make their small jobs finish faster. But it would be nice if some automatic decision can be done here such that the normal users don't have to learn how to configure an extra parameter. The point here is that for some cases (small job, small number of mappers or reducers) we should not be spending time on waiting the reducers to start because the waiting time is significant (or it is cheap to start the reducer earlier). Automatically reducing the latency makes our user happy. Reducer should start faster for smaller jobs Key: MAPREDUCE-1463 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Reporter: Scott Chen Assignee: Scott Chen Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch Our users often complain about the slowness of smaller ad-hoc jobs. The overhead to wait for the reducers to start in this case is significant. It will be good if we can start the reducer sooner in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1463) Reducer should start faster for smaller jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1463: -- Attachment: MAPREDUCE-1463-v3.patch Reducer should start faster for smaller jobs Key: MAPREDUCE-1463 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Reporter: Scott Chen Assignee: Scott Chen Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, MAPREDUCE-1463-v3.patch Our users often complain about the slowness of smaller ad-hoc jobs. The overhead to wait for the reducers to start in this case is significant. It will be good if we can start the reducer sooner in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented
[ https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831800#action_12831800 ] eric baldeschwieler commented on MAPREDUCE-326: --- Sounds like we are on the same page. Proposals will be greeted with interest. Acceptance criteria: 1) Backwards compatible to 20 (including legacy APIs in 20 please, since we're still debugging the new APIs) 2) Performance neutral for 20 APIs, no large hit for legacy APIs The lowest level map-reduce APIs should be byte oriented Key: MAPREDUCE-326 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: eric baldeschwieler As discussed here: https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237 The templates, serializers and other complexities that allow map-reduce to use arbitrary types complicate the design and lead to lots of object creates and other overhead that a byte oriented design would not suffer. I believe the lowest level implementation of hadoop map-reduce should have byte string oriented APIs (for keys and values). This API would be more performant, simpler and more easily cross language. The existing API could be maintained as a thin layer on top of the leaner API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1318) Document exit codes and their meanings used by linux task controller
[ https://issues.apache.org/jira/browse/MAPREDUCE-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831801#action_12831801 ] Hadoop QA commented on MAPREDUCE-1318: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435095/MAPREDUCE-1318.patch against trunk revision 907967. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/306/console This message is automatically generated. Document exit codes and their meanings used by linux task controller Key: MAPREDUCE-1318 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1318 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Reporter: Sreekanth Ramakrishnan Assignee: Anatoli Fomenko Priority: Blocker Fix For: 0.21.0 Attachments: HADOOP-5912.1.patch, MAPREDUCE-1318.1.patch, MAPREDUCE-1318.2.patch, MAPREDUCE-1318.patch Currently, linux task controller binary uses a set of exit code, which is not documented. These should be documented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1399) The archive command shows a null error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1399: - Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. thanks nicholas. The archive command shows a null error message -- Key: MAPREDUCE-1399 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399 Project: Hadoop Map/Reduce Issue Type: Bug Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: m1399_20100204.patch, m1399_20100205.patch, m1399_20100205trunk.patch, m1399_20100205trunk2.patch, m1399_20100205trunk2_y0.20.patch, MAPREDUCE-1399.patch {noformat} bash-3.1$ hadoop archive -archiveName foo.har -p . foo . Exception in archives null {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1473) Sqoop should allow users to control export parallelism
[ https://issues.apache.org/jira/browse/MAPREDUCE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-1473: - Attachment: MAPREDUCE-1473.patch Attaching a patch which provides this functionality. This uses CombineFileInputFormat to batch up Sqoop's input files into a user-defined number of splits. As in importing, the degree of parallelism is controlled with the {{\-m}} / {{--num-mappers}} parameters. Sqoop should allow users to control export parallelism -- Key: MAPREDUCE-1473 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1473 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1473.patch Sqoop uses MapReduce jobs to export files back to a table in the database. The degree of parallelism is controlled by the number of splits; i.e., the number of input files used. The bottleneck in the system, though, is likely to be the database itself. Users should have the ability to tune the number of parallel exporters being used to a degree appropriate to their database deployment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1473) Sqoop should allow users to control export parallelism
[ https://issues.apache.org/jira/browse/MAPREDUCE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-1473: - Status: Patch Available (was: Open) Sqoop should allow users to control export parallelism -- Key: MAPREDUCE-1473 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1473 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1473.patch Sqoop uses MapReduce jobs to export files back to a table in the database. The degree of parallelism is controlled by the number of splits; i.e., the number of input files used. The bottleneck in the system, though, is likely to be the database itself. Users should have the ability to tune the number of parallel exporters being used to a degree appropriate to their database deployment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1474) forrest docs for achives is out of date.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1474: - Attachment: MAPREDUCE-1474.patch doc changes for hadoop archives. forrest docs for achives is out of date. Key: MAPREDUCE-1474 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1474 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 0.22.0 Attachments: MAPREDUCE-1474.patch The docs for archives are out of date. The new docs that were checked into hadoop common were lost because of the project split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1474) forrest docs for archives is out of date.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1474: - Status: Patch Available (was: Open) forrest docs for archives is out of date. - Key: MAPREDUCE-1474 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1474 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 0.22.0 Attachments: MAPREDUCE-1474.patch The docs for archives are out of date. The new docs that were checked into hadoop common were lost because of the project split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-434) local map-reduce job limited to single reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831855#action_12831855 ] Hadoop QA commented on MAPREDUCE-434: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435228/MAPREDUCE-434.4.patch against trunk revision 908283. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/307/console This message is automatically generated. local map-reduce job limited to single reducer -- Key: MAPREDUCE-434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434 Project: Hadoop Map/Reduce Issue Type: Bug Environment: local job tracker Reporter: Yoram Arnon Assignee: Aaron Kimball Priority: Minor Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, MAPREDUCE-434.4.patch, MAPREDUCE-434.patch when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and the number of reduce tasks is set at 1. This prevents me from locally debugging my partition function, which tries to partition based on the number of reduce tasks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-1433: --- Attachment: 1433.bp20.patch More up-to-date version of the backported patch. Not for commit. Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.22.0 Attachments: 1433.bp20.patch, 1433.bp20.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831868#action_12831868 ] Hudson commented on MAPREDUCE-1433: --- Integrated in Hadoop-Mapreduce-trunk-Commit #233 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/233/]) Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.22.0 Attachments: 1433.bp20.patch, 1433.bp20.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError
[ https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831867#action_12831867 ] Hudson commented on MAPREDUCE-1425: --- Integrated in Hadoop-Mapreduce-trunk-Commit #233 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/233/]) archive throws OutOfMemoryError --- Key: MAPREDUCE-1425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425 Project: Hadoop Map/Reduce Issue Type: Improvement Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425_y_0.20.patch {noformat} -bash-3.1$ hadoop archive -archiveName t4.har -p . t4 . Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1432) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at java.lang.String.replace(String.java:2208) at org.apache.hadoop.fs.Path.normalizePath(Path.java:146) at org.apache.hadoop.fs.Path.initialize(Path.java:137) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.makeQualified(Path.java:296) at org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256) at org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393) at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1455) Authorization for servlets
[ https://issues.apache.org/jira/browse/MAPREDUCE-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi reassigned MAPREDUCE-1455: --- Assignee: Ravi Gummadi Authorization for servlets -- Key: MAPREDUCE-1455 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1455 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Devaraj Das Assignee: Ravi Gummadi Fix For: 0.22.0 This jira is about building the authorization for servlets (on top of MAPREDUCE-1307). That is, the JobTracker/TaskTracker runs authorization checks on web requests based on the configured job permissions. For e.g., if the job permission is 600, then no one except the authenticated user can look at the job details via the browser. The authenticated user in the servlet can be obtained using the HttpServletRequest method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1455) Authorization for servlets
[ https://issues.apache.org/jira/browse/MAPREDUCE-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831880#action_12831880 ] Ravi Gummadi commented on MAPREDUCE-1455: - We will get authenticated user using HttpServletRequest.getRemoteUser(). I am proposing to run the methods that access the job as the user(using UserGroupInformation.doAs()) from JSPs and servlets sothat methods of JobTracker can just do authorization(by checking the UserGroupInformation.getCurrentUser()). This avoids many changes in MAPREDUCE-1307 and also avoids adding new methods that take UGI as parameter in jobtracker. Thoughts ? Authorization for servlets -- Key: MAPREDUCE-1455 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1455 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Devaraj Das Assignee: Ravi Gummadi Fix For: 0.22.0 This jira is about building the authorization for servlets (on top of MAPREDUCE-1307). That is, the JobTracker/TaskTracker runs authorization checks on web requests based on the configured job permissions. For e.g., if the job permission is 600, then no one except the authenticated user can look at the job details via the browser. The authenticated user in the servlet can be obtained using the HttpServletRequest method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1333) Parallel running tasks on one single node may slow down the performance
[ https://issues.apache.org/jira/browse/MAPREDUCE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831884#action_12831884 ] Xing Shi commented on MAPREDUCE-1333: - The purpose of the distribute system is high utility. If you want to analysis running tasks performance, you can just set one node with one map and no reduce, Vice-versa. Parallel running tasks on one single node may slow down the performance --- Key: MAPREDUCE-1333 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1333 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker, task, tasktracker Affects Versions: 0.20.1 Reporter: Zhaoning Zhang When I analysis running tasks performance, I found that parallel running tasks on one single node will not be better performance than the serialized ones. We can set mapred.tasktracker.{map|reduce}.tasks.maximum = 1 individually, but there will be parallel map AND reduce tasks. And I wonder it's true in the real commercial clusters? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete
[ https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1305: - Attachment: M1305-1.patch Modified Peter's patch to remove FsShell invocations. That part isn't actually horrible, performance-wise; it reuses the instance, so while there's certainly avoidable overhead in parsing and whatnot, it's not forking a process or anything too notable. It also supports the Trash, which may be useful/appreciated. Is supporting Trash useful for DistCp users running with \-delete? Massive performance problem with DistCp and -delete --- Key: MAPREDUCE-1305 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.20.1 Reporter: Peter Romianowski Assignee: Peter Romianowski Attachments: M1305-1.patch, MAPREDUCE-1305.patch *First problem* In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus objects when the path is all we need. The performance problem comes from org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries to retrieve file permissions by issuing a ls -ld path which is painfully slow. Changed that to just serialize Path and not FileStatus. *Second problem* To delete the files we invoke the hadoop command line tool with option -rmr path. Again, for each file. Changed that to dstfs.delete(path, true) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1305) Massive performance problem with DistCp and -delete
[ https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831893#action_12831893 ] Koji Noguchi commented on MAPREDUCE-1305: - bq. Is supporting Trash useful for DistCp users running with -delete? To me, yes. I've seen many of our users deleting their files accidentally. Trash has saved us great time. I'd like to request the Trash part to stay if there's not much performance problem. Massive performance problem with DistCp and -delete --- Key: MAPREDUCE-1305 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.20.1 Reporter: Peter Romianowski Assignee: Peter Romianowski Attachments: M1305-1.patch, MAPREDUCE-1305.patch *First problem* In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus objects when the path is all we need. The performance problem comes from org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries to retrieve file permissions by issuing a ls -ld path which is painfully slow. Changed that to just serialize Path and not FileStatus. *Second problem* To delete the files we invoke the hadoop command line tool with option -rmr path. Again, for each file. Changed that to dstfs.delete(path, true) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831894#action_12831894 ] Amar Kamat commented on MAPREDUCE-1463: --- What should be the behavior where total number of maps and reducers are less (i.e a small job for now) but takes huge amount of time to finish. For example the map takes a day to run while the reduces are also compute intensive. In such a case would we still consider the job as small job? I think what we want to capture is the job behavior (fast *finishing* job versus others). Using task counts might not be sufficient. Scott, wouldn't this problem be solved if you set 'mapreduce.job.reduce.slowstart.completedmaps' to a default value of 0 (instead of 0.5) for all your users? Reducer should start faster for smaller jobs Key: MAPREDUCE-1463 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Reporter: Scott Chen Assignee: Scott Chen Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, MAPREDUCE-1463-v3.patch Our users often complain about the slowness of smaller ad-hoc jobs. The overhead to wait for the reducers to start in this case is significant. It will be good if we can start the reducer sooner in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.